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OPTIMAL  CONTROL  OF  ADMISSION,  ROUTING,  AND  SERVICE 
IN  QUEUES  AND  NETWORKS  OF  QUEUES:  A  TUTORIAL  REVIEW 


Shaler  Stidham,  Jr. 

North  Carolina  State  University  at  Raleigh 
September,  1984 

Abs±nao± 

Queueing  models  can  be  useful  in  the  analysis,  design,  and 
control  of  production,  transportation,  communication,  and 
logistics  systems.  Using  the  theory  of  Markov  decision  processes 
and  the  inductive  techniques  of  dynamic  programming,  normative 
models  have  been  developed  for  optimal  control  of  admission, 
routing,  and  servicing  of  jobs  in  queues  and  networks  of  queues. 
We  review  some  of  these  models  in  a  unified  format,  beginning 
with  single-facility  models  and  then  moving  on  to  models  for 
networks  of  queues.  The  emphasis  is  on  using  induction  (value 
iteration)  to  establish  the  qualitative  structure  of  optimal 
control  policies.  We  compare  the  resulting  policies  to  some  ad 
hoc  control  rules  that  have  been  proposed  in  the  literature. 
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several  stages  of  disassembly,  inspection  and/or  replacement  of 
components.  In  communication  systems,  a  series  of  queues  may 
provide  a  model  for  a  "virtual  channel"  --  a  sequence  of  linked 
communication  channels  —  between  the  source  and  destination  of  a 
certain  class  of  messages  (cf.  Lazar ( 1 9S3) ) . 

It  may  be  the  case  that  different  jobs  require  different 
combinations  of  services  in  different  sequences.  There  could  be 
feedback  of  certain  jobs,  because  of  the  need  for  rework.  In 
such  cases  the  appropriate  queueing-network  model  will  be  more 
complicated,  with  multiple  branches  and  combined  ser i es-paral 1 e 1 
structure.  The  classical  industrial  application  for  such  a  more 
general  network  model  is  a  j.nh  shop.,  and  indeed  the  seminal 
theoretical  paper  on  networks  of  queues  ( Jackson ( 1 963) )  has  these 
words  in  the  title.  Subsequently,  Jackson's  model  and 

I 

generalizations  have  been  successfully  applied  to  performance 

I 

evaluation  in  computer/communication  systems  (see,  e.g., 

K1 e i nrock < 1 976) ) .  More  recently,  the  industrial  engineering 

community  has  recognized  the  utility  of  the  ne tworks-of -queues 
model  for  analysis  of  flexible  manufacturing  systems:  systems 

consisting  of  automated  manufacturing  cells,  capable  of 

efficiently  processing  a  variety  of  jobs  requiring  processing  by 

I 

different  combinations  of  machines  (see,  e.g,,  Buzacott  and 

Yao( 1983) ) . 

For  the  most  part,  queueing-network  models  have  been 
descriptive,  rather  than  normative.  That  is,  they  have  provided 
a  tool  for  estimating  operating  characteristics  or  measures  of 
effectiveness,  such  as  congestion  levels  or  throughput,  of 
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existing  or  proposed  systems,  operating  according  to  spec  i -tied 
policies.  The  task  o-f  selecting  the  best  design  or  the  best 
control  policy  has  been  left  to  the  system  operator.  Recently, 
however,  more  and  more  researchers  have  turned  to  mathematical 
models,  not  only  for  description,  but  also  to  help  make  crucial 
design  and  control  decisions. 

Optimal  design  is  an  integral  part  of  queueing-network 
models  for  automatic  transfer  lines,  where  the  location  and  size 
of  buffers  for  in-process  inventories  are  design  variables  (Ho  et 
a  1  (  1  979 )  ,  A1  t  i  ok  (  1 982)  ,  Altiok  and  St  i  dham(  1 983) )  .  A  common 
approach  is  to  use  simulation  or  an  analytical  Markov  model  to 
evaluate  the  costs  and  benefits  of  a  fixed  buffer  configuration, 
and  then  use  a  gradient-search  algorithm  to  move  toward  a  local 
(and,  with  luck,  global)  optimum.  Other  possible  design  varia¬ 
bles  are  the  number  of  servers  and/or  the  service  rates  at  each 
node  (e.g.,  work  station,  repair  facility)  of  the  network. 
Gross,  Miller,  and  Sol and( 1982)  used  a  queueing-network  model  to 
represent  a  system  in  which  repairable  items  are  processed  at 
either  a  depot  or  field  station  and  there  is  the  possibility  of 
providing  spares.  The  number  of  spares  and  the  number  of  servers 
at  each  repair  facility  were  design  variables  and  the  objective 
was  to  minimize  the  cost  of  providing  spares-  and  servers,  while 
maintaining  a  desired  level  of  operational  readiness  (probabi¬ 
lity  that  the  required  number  of  items  are  operational).  An 
integer-programming  algorithm  was  used  to  find  the  optimal  de- 
s  i  gn  . 

In  a  design  problem,  the  decision  variables,  once  fixed, 
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remain  so  throughout  the  planning  horizon.  By  contrast,  i  f  the 
decision  variables  can  be  adjusted  as  the  system  status  changes, 
then  we  have  a  conlccd.  problem.  Control  may  be  exercised  in  a 
queueing  network  by  varying  the  arrival  or  service  rates,  turning 
servers  on  or  off,  or  changing  job  priorities  or  routings.  By 
doing  so,  one  can  balance  the  "bad”  (congestion)  with  the  "good" 
(throughput).  As  an  example,  consider  the  communication  system 
illustrated  in  Figure  2,  (This  example  captures,  in  simplified 
form,  several  of  the  important  issues  in  the  control  of  flow  and 
routing  in  communication  systems.) 


Figure  2 .  Simple  Commu  n i c  a  t i on  Ex  amp  1 e . 


There  are  three  cities  (A,  B,  and  C) ,  with  direct  channels 
labelled  i  ,  2  ,  and  3  ,  linking  A  to  B  ,  B  to  C  ,  and  A 
to  C  ,  respectively.  Each  channel  transmits  messages  one  at  a 
t i me .  Messages  waiting  to  be  transmitted  are  placed  in  an 
infinite-capacity  buffer  in  front  of  the  channel .  There  are 
three  classes  of  messages  (jobs):  originating  in  A  and  des¬ 
tined  for  B  (class  1),  originating  in  B  and  destined  for  C 
(class  2),  and  originating  in  A  and  destined  for  C  (class  3). 
The  system  has  no  control  over  messages  of  class  1  or  2  :  all 
messages  of  each  class  must  be  sent  over  the  corresponding  direct 
channel .  Messages  of  class  3,  however,  may  be  controlled  in  two 
ways:  (i)  by  accepting  or  rejecting  a  message  when  it  "arrives" 
(is  generated  at  city  A);  ( i i )  by  choosing  whether  to  route  it 
directly  to  city  C  via  channel  3,  or  indirectly  via  channels  i 
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an  d  2  * 


In  the  latter  case,  the  message  competes  with 


local 


traffic  (messages  of  classes  1  and  2). 

The  corresponding  network  of  queues  (see  Figure  3)  has  three 


Figure  3.  Queueing-Network  Model  for  Communication  System. 


nodes,  one  corresponding  to  each  channel  (server)  and  its  buffer. 
The  decisions  (accept  us.  reject,  route  via  channels  1  and  2  us. 
uia  channel  3)  are  indicated  by  "toggles'1. 

What  can  one  say  about  "optimal",  or  at  least  "good",  con¬ 
trol  policies  for  this  probl em?  On  the  one  hand,  one  would  like 
to  get  as  many  messages  as  possible  through  the  system  (maximize 
throughput).  This  could  be  accomplished  by  admitting  each  class- 
3  arrival .  On  the  other  hand,  admitting  a  class-3  message  has 
associated  "costs".  One  cost  is  reflected  in  the  time  it  takes 
the  message  to  reach  its  destination,  which  depends  on  which 
route  is  chosen  and  how  many  messages  are  ahead  of  it  in  the 
buffer(s).  Economists  would  call  this  an  11  internal  effect". 
Another  cost  is  the  "congestion"  which  is  added  to  the  system  by 
admitting  the  message,  as  reflected  in  the  additional  delays 
suffered  by  later  messages  because  of  its  presence.  The  economic 
term  for  this  type  of  cost  is  an  "external  effect".  A  rational 
control  policy  for  admission  and/or  routing  of  messages  must 
balance  these  benefits  and  costs.  Intuition  suggests  that  such  a 
pol i cy  should  have  (at  least)  the  following  (mono ton i c i ty)  pro¬ 
perties:  if  it  accepts  an  arriving  message  in  a  given  state, 
then  it  should  also  accept  if  one  or  more  messages  are  removed 
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Often  it  is  useful  to  view  the  components  of  production, 
transportation,  communication,  or  logistics  systems  as  queu&s ,  in 
which  j.obs  (customers)  are  processed  by  one  or  more  secures.  The 
jobs  could  be  parts  or  subassemblies,  vehicles,  messages  or 
computer  programs,  or  repairable  items.  The  servers  could  be 
work  stations,  traffic  signals  or  road  segments,  communication 
channels  or  computer  CPU's,  or  maintenance/repair  facilities. 
Often  there  are  many  such  service  facilities,  linked  together  by 
paths  along  which  jobs  may  travel  from  one  facility  to  another. 

An  abstract  model  for  such  systems  is  a  naiwook.  of  quauas , 
and  such  models  have  been  increasingly  recognized  as  useful  tools 
for  understanding  the  behavior  of  complex  service  systems.  Per¬ 
haps  the  simplest  network  consists  of  a  number,  m  ,  of  queues  in 
series,  in  which  the  output  of  queue  i  is  the  input  to  queue 
i+1  .  A  flow=shop  production  system,  such  as  an  assembly  1 ine  or 
automatic  transfer  line,  has  this  structure  (see  Figure  1).  In 

Figure  1.  Queues  in  Series. 

such  a  system,  each  job  must  be  processed  at  each  of  the  work 
stations  (numbered  i  =  l,2,...,m  )  in  the  same  order.  Each  work 
station  consists  of  one  or  more  servers  (machines  and/or  workers) 
all  capable  of  performing  the  same  task,  preceded  by  a  buffer  or 
storage  area  where  jobs  wait  for  processing.  Obviously,  such  a 
model  is  not  limited  to  production  and  assembly  operations,  but 
may  also  apply  to  maintenance  and  repair  facilities  in  which 
incoming  jobs  (e.g.,  aircraft,  vehicles)  of  identical  or  similar 
configuration  scheduled  for  routine  maintenance  must  go  through 
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•from  a.nv  one  of  the  nodes;  if  it  is  preferable  to  route  a  c  1  ass¬ 
et  message  via  node  3  in  a  given  state,  then  it  will  remain 
preferable  to  do  so  if  one  or  more  messages  are  removed  from  node 
3  and/or  added  to  nodes  1  and  2. 

At  the  very  least,  one  expects  a  mathematic  control  model  to 
be  capable  of  confirming  or  denying  the  validity  of  such  in¬ 
tuition.  Beyond  that,  such  a  model  should  lead  to  efficient 
numerical  algorithms  for  computing  the  control  parameters  of  an 
optimal  policy  —  specifically,  the  boundary  of  the  11  acceptance 
region11  and  the  boundary  between  the  regions  of  the  state  space 
where  it  is  optimal  to  route  to  nodes  1  and  2  vs.  node  3  <the 
11  swi  tch  i  ng  curve" )  . 

For  another  example  of  a  control  problem  involving  a  network 
of  queues,  let  us  return  to  the  ser i es~of -queues  model  for  a  flow 
shop  as  pictured  in  Figure  1.  Suppose  that  it  is  possible 
to  control  (dynamically  vary)  the  rates  at  which  the  servers  at 
various  nodes  work,  in  response  to  changing  congestion  levels. 
For  example,  one  might  want  to  turn  a  server  off  when  the  down¬ 
stream  buffer(s)  have  accumulated  a  large  number  of  Jobs  or  when 
the  number  of  jobs  in  the  upstream  buffer(s)  is  small.  "Just-in¬ 
ti  me  11  production  policies  like  the  Japanese  .kanban  are  a  special 
case  of  this  type  of  policy.  Again,  one  would  hope  that  a 
ma themat i cal  optimization  model,  based  on  a  plausible  benefit- 
cost  structure,  would  enable  one  to  test  the  validity  of  such 
operating  principles  and  efficiently  calculate  the  parameters  of 
the  associated  control  policies. 
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I'll e  primary  goal  of  this  paper  is  to  survey  the  progress 
that  has  been  made  toward  accomplishing  these  two  goals,  in  the 
general  context  of  mathematical  models  for  control  of  networks  of 
queues.  We  focus  attention  on  models  based  on  Markov  decision 
processes,  using  the  inductive  ideas  embodied  in  dynamic 
pr ogramm i ng  as  a  tool  for  characterizing  the  structure  of 
optimal  policies  as  well  as  calculating  their  parameters.  We 
begin  by  illustrating  the  economic  assumptions  and  analytic 
technique  in  the  context  of  some  simple,  one-facility  models,  and 
then  move  on  to  mu  1 t i -f ac i 1  i ty  (queueing-network)  models.  The 
models  considered  allow  control  of  admission,  routing,  and/or 
servicing  of  .jobs.  For  a  more  detailed  survey  that  concentrates 
on  control  of  admission  and  routing,  see  St i dham< 1 984) .  Earlier 
comprehensive  surveys  may  be  found  in  Sobel(1974),  St i dham( 1 974) , 
and  Crabill,  Gross,  and  Magaz i ne ( 1 977) . 

I  .-  Admission  Conitol  ±o  a  J2ue.ua 

We  first  consider  a  simple,  single-facility  model  with 
control  of  admission  of  customers.  It  is  a  special  case  of  a 
model  for  exponential  congestion  systems,  studied  by  Lippman  and 
St i dham( 1 977)  and  is  illustrated  below  in  Figure  4.  Jobs  arrive 


Figure  4.  M/M/1  Queue  with  Control  of  Admission. 


at  a  single-server  facil i ty  according  to  a  Poisson  process  with 
mean  arrival  rate  A  (jobs  per  unit  time).  Equivalently:  the 
interarrival  times  between  jobs  are  i .  i .d.  (independent  and  i den- 
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tically  distributed)  with  an  exponential  distribution  with  mean 
1/ X  .  )  (See,  e.g.,  Ross(l?70>  -for  a  discussion  of  properties  of 
the  Poisson  process  and  exponential  distribution.)  The  system 
operator  controls  the  arrivals  by  deciding  whether  to  accept 
(action  a  =  1  )  or  reject  (action  a  =  0  )  an  arriving  job. 
Accepted  jobs  join  an  infinite-capacity  queue  and  wait  for  ser¬ 
vice.  There  is  a  single  server  who  serves  jobs  one  at  a  time, 
with  service  times  that  are  i.i.d.  with  an  exponential  distribu¬ 
tion  with  mean  1  / fx  .  The  shorthand  for  this  is  to  say  that  we 
have  an  exponential  server  with  mean  service  rate  /a.  (jobs  per 
unit  time).  In  the  literature  on  queues,  a  system  like  that 
illustrated  in  Figure  4,  but  with  no  restriction  on  entry  of 
jobs,  is  referred  to  as  an  M/M/1  queue.  The  "M"  in  the  first- 
position  stands  for  the  "memory less"  (exponential)  distribution 
of  interarrival  times.  The  "M"  in  the  second  position  tells  us 
that  the  service- time  distribution  is  also  exponential.  The  "1" 
in  the  third  position  stands  for  "one  server".  We  extend  this 
terminology  to  control  models,  so  that  the  model  under  considera¬ 
tion  becomes  an  "M/M/1  queue  with  control  of  admission". 

For  clarity  of  exposition,  we  assume  a  simple  benefit/cost 
structure  reflecting  the  fact  that  throughput  is  "good"  and 
congestion  is  "bad".  Each  admitted  job  generates  a  fixed  reward 
(utility)  r  .  There  is  a  waiting  cost  h  per  job  per  unit  time 
in  the  system  (i.e.,  in  the  queue  plus  in  service).  Equiva¬ 
lently,  by  analogy  with  inventory-control  problems,  we  can  say 
that  there  is  a  cost  of  holding  jobs  h*i  per  unit  time  while 
there  are  i  jobs  in  the  system.  Future  rewards  and  costs  are 
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continuously  discounted  at  rate  o<  >  0  ,  so  that  the  present 

value  ot  a  net  benefit  x  received  at  time  t  is  x>expC-#<t>. 

The  objective  of  the  system  operator  is  to  maximize  the  total 

expected  discounted  net  benefit  over  an  infinite  time  horizon.* 

^Discounting  reflects  the  time  preferences  of  a  rational  economic 
decision  maker  and  makes  it  possible  to  compare  present  and 
future  benefits  and  costs.  An  alternate  optimality  criterion  is 
long-run  average  net  benefit  per  unit  time.  Aver age -op t imal  con¬ 
trol  policies  can  be  derived  from  U  -d i scoun t-op t i mal  policies  by 
letting  e<  approach  zero  (see,  e.g.,  Ross(1970),  Lippman  and 

St i dham< 1 977) > .  We  shall  therefore  confine  our  attention  to  the 
d i scoun ted-ne  t-benef it  criterion. 

Our  goal  is  to  characterize  the  structure  of  an  optimal 

control  policy  (rule  for  choosing  actions)  and  develop  efficient 
techniques  for  computing  its  parame ters .  To  this  end,  we  first 
use  concepts  from  dynamic  programming  < Be  1  1  man <  1 958) ,  Ross(1970)) 
to  derive  a  functional  equation  satisfied  by  the  optimal  value 
function,  i ) :=  maximum  total  expected  discounted  net  benefit 

over  an  infinite  horizon,  starting  from  state  i  .  The  Principle 
of  Optimality  of  dynamic  programming  says  that,  from  each 

starting  state  i  ,  an  optimal  policy  will  choose  an  action  a( i ) 

=  a  that  maximizes  the  sum  of  the  discounted  net  benefit  earned 

until  the  next  observation  point  and  the  present  value  of  the 
discounted  net  benefit  earned  after  the  next  decision  point, 
assuming  that  we  follow  an  optimal  policy  from  whatever  state  j 
we  enter  at  that  point.  For  the  problem  at  hand,  it  follows  that 
the  optimal  value  function  V< i >  satisfies  the  optimality  equa¬ 
tion  (  i  >.0 )  : 

i  )  =  [-  h-i  +  'XmaxCr  +  V<i  +  l),V(i>>  +  /i  V<  i -1  >  J/<*+/i+«>  ,  (1) 
where  V(-l)  =  V(0)  .  A  rigorous  derivation  of  the  dynamic- 
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programm i ng  optimality  equation  under  general  conditions  satis¬ 
fied  by  our  problem  may  be  -found  in  Schal(1975),  Ber tsekas< 1 980 ) , 
or  Wh i t t 1 e < 1 983) .  For  the  problem  at  hand,  there  are  several 
possible  heuristic  derivations  o-f  (i),  one  of  them  being  the 
f  o 1  lowing. 

We  observe  the  system  only  at  transitions  (changes  of  state 
caused  by  arrivals  or  service  completions).  The  time  between 
transitions  is  the  minimum  of  two  independent  exponential  random 
variables,  the  time  until  the  next  arrival  and  the  time  until  the 
next  service  completion,  and  thus  is  itself  exponentially  distri¬ 
buted  with  parameter  X  +  /t  ,  the  sum  of  the  arrival  and  service 
rates.  Until  a  transition  occurs,  given  that  we  are  in  state 
i  ,  we  incur  holding  cost  at  the  constant  rate  h*i  .  Applying 
the  discount  factor  exp<-«tt>  for  cost  accruing  at  time  t  and 
integrating  over  the  exponentially  distributed  time  until  the 
next  transition  gives  the  term  -  h '  i  /(X+/o+<<)  ,  the  expected 
discounted  holding  cost  until  the  next  observation  point.  The 
expected  discount  factor  over  the  interval  between  now  and  the 
next  observation  point  is  (  X  +  )/<  X  +  /*+«<  )  .  The  next  observa¬ 
tion  is  at  an  arrival  or  a  service  completion  with  respective 
probab  ilities  X/(X  +  /*)  and  /  (  X  +  /c  )  .  These  express  i  on  s 
follow  from  well-known  properties  of  exponential  distributions.# 

#Note  that  because  V(-l)  =  U<0)  ,  the  optimality  equation  for 
state  0  implicitly  assumes  that  the  server  continues  to  operate 
when  no  customers  are  present,  but  that  any  service  completions 
are  fictitious,  in  effect  creating  "dummy"  transitions  from  state 
0  to  itself.  It  is  easy  to  verify  that  the  equation  for-  state 
0  is  equivalent  to 

U(0)  =  %  max  tr  +  1 ) ,  0 ) >/< X  +  «  )  , 
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the  equation  resulting  -from  imbedding  only  at  "real"  events 
(arrivals).  The  advantage  of  our  formulation  (often  referred  to 
in  the  literature  as  " un i f orm i z a t i on " >  is  that  it  makes  the  time 
between  observation  points  (although  random)  independent  of  the 
state  and  action.  The  resulting  optimality  equations  are 
structurally  equivalent  to  those  of  the  approximating  discrete- 
time  Markov  decision  process  resulting  from  observing  the  system 
at  fixed  time  intervals  of  length  dt  and  ignoring  multiple 
events  in  a  single  interval  (which  have  probability  o(dt)  ).  The 
equations  in  this  form  are  more  amenable  to  qualitative  analysis 
via  successive  approx i mat i ons ,  as  we  shall  see  presently.  Uni- 
formization  is  a  standard  technique  in  the  analysis  of  Markov 
chains.  It  usefulness  in  models  for  control  of  queues  was  recog¬ 
nized  by  L i ppman ( 1 975)  (see  also  Serf ozo( 1 97?) ) . 


We  can  use  the  optimality  equation  to  show  that  an  optimal 
policy  is  trionoionic ,  specifically  that  the  optimal  action  a(  i  ) 
is  non- i ncreas i ng  in  i  1  0  .  In  other  words,  an  optimal  admis¬ 
sion  pol icy  is  characterized  by  a  cnliical  number:  i*  such  that 
an  arriving  job  is  admitted  if  and  only  if  i  <  i*  .  By  adding 
and  subtracting  the  term  V< i )  on  the  right-hand  side,  we  can 
rewrite  the  optimal i ty  equation  (1)  in  equivalent  form 


V(f)  «  [ -h  *  i  +  +  ^ttVC  i-l) 

+  “X  max  Cr  -  CV<  i  )  -V  <  i  +  1  )  ]  ,  0  >  3/ (  ^  + /t  +  *  ) 


(2) 


f  r  om 

wh  i ch  it  is  easy 

to 

see  that  an 

op t i ma 1  a dm i ss i on  p 

o  1  i  c  y 

will 

be  mono ton  i  c  if 

W(  i 

)  -  Y( i +1 ) 

is  non “dec re as i ng  in 

i  > 

that 

is,  if  KH  i  )  i 

s  concave.  In 

this  case ,  the  cr i 

t  i  ca  1 

number  w i 1 1  be  i*  —  minCi:  V< i >  -  W( i + 1 )  1  r >  .  The  quantity 
0( i )  -  V<i+1)  can  be  interpreted  as  the  total  cost  (including 
loss  of  net  benefit  to  future  arrivals)  caused  by  the  entry  of  a 
job  in  state  i  .  We  should  admit  a  job  if  and  only  if  this  cost 
is  smaller*  than  the  reward  r  . 


*Uur  convention  is  to  reject  in  case  cost  exactly  equals  reward. 
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Since  the  optimality  equation  (2)  only  de  -fines  V<  i  > 
implicitly,  it  does  not  yield  a  direct  proof  that  V<  i  >  is 
concave.  One  can,  however,  exploit  the  -fact  that  (2)  can  be 
solved  by  successive  approximations:  start  with  an  approximation, 

V  ,  of  the  optimal  value  function  V  and  insert  it  in  place  of 
0 

V  in  the  right-hand  side  of  (1)  ,  thus  generating  a  new 

approximating  function  V  .  Repeat  this  process,  defining  V 

1  n 

in  terms  of  U  (nil)  by  the  recursive  analogue  of  (1): 

n-1 


Un  <  i  )  =  [ -h  •  i  +  'XmaxCr  +  Vn-l<i+l),Vn-l<i)> 

+  /tV  <  i-1  )]/<  'X  +/*-+  «  >  . 

n-1 


<  3) 


The  theory  behind  the  convergence  of  V  to  V  ,  including 

n 

necessary  restrictions  on  V  ,  may  be  found  in  Schal<19?5), 

0 

St  i  dham<  1981)  ,  Uh  i  t  tl  e<  1983)  ,  van  der  Ulal  ( 1981 )  .  In  dynamic 

programming,  it  is  customary  to  call  this  approach  ua.liie 

i±£.ca±ian,  following  Be  1  1  man <  1  957)  .  V  can  also  be  interpreted 

n 

as  the  maximum  total  expected  net  benefit  if  the  system  is  to  be 

operated  for  only  n  stages  (observation  points)  and  then  shut 

down,  earning  a  terminal  (scrap)  value  according  to  a  given 

function  V  <j)  of  the  final  state  j  . 

0 


Value  iteration  can  be  used  as  the  basis  for  an  inductive 

proof  that  V  ,  and  hence  ,  is  concave.  It  is  intuitively 

n 

clear  that  V<  i  )  should  also  be  non-increasing  in  i  .  For 
technical  reasons,  having  to  do  with  the  boundary  at  state  i =0  , 
it  is  necessary  to  add  this  property  to  the  induction  hypothesis, 
which  thus  becomes: 
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i  1  0  . 


M  (i)  is  concave  and  non-increasing  in 


(4) 


n 

The  inductive  step  involves  showing,  via  equation  <3),  that  'v1 

n 

satisfies  (4)  whenever  V  does.  The  non-trivial  part  is 

n-1 

showing  that  concavity  of  a  function  g(  i  )  implies  concavity  of 
the  function 


f<i):=  max  tr  +  g(i  +  l),  g<i)J.  <5) 

This  was  done  by  L  i  ppman  ( 1  ?75)  .  The  function  =0  trivially 

0 

satisfies  (4)  and  therefore  is  a  suitable  starting  function  for 

the  induction.  Convergence  of  C  to  V  in  this  case  follows 

n 

from  Schal(1975).  In  the  process  of  verifying  the  optimality  of 
a  monotonic  policy  for  the  infinite-horizon  discounted  problem, 
we  also  verify  monotonicity  for  each  n-stage  problem  --  a 
property  that  may  be  of  interest  in  applications  to  problems 
with  a  finite  planning  horizon. 

Inductive  analyses  like  this,  based  on  the  11  preservat  i  on  of 
concavity"  through  transf ormat i ons  like  (5),  form  the  basis  for 
the  study  of  the  structure  of  optimal  policies  in  many  problems 
in  the  control  of  queues.  The  next  section  provides  another 
illustration  of  the  power  of  this  approach. 

Ee.macksj. 

1  .  £ni±±o.a±-numb££:  policies  and  ihein  nnmenical  jcompnlalion^ 

The  inductive  analysis  has  reduced  the  M/M/1  adm i ss i on-con trol 
problem  to  one  of  finding  the  optimal  critical  number  i*  .  This 
is  a  one-dimensional  optimization  problem:  among  the  class  of 
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M/M/i/n  systems ,  find  the  system  capacity  n  =  i*  that  yields 
the  maximum  value.  Naor(19S9)  used  this  approach  in  his  seminal 
paper  on  admission  control.  For  M/h/l/n  systems  a  closed-form 
expression  is  available  for  the  long-run  average  net  benefit, 
which  can  be  shown  to  be  uni  modal  in  n  ,  so  that  a  local  maximum 
is  a  global  maximum.  Naor  exploited  this  property  to  give  neces¬ 
sary  and  sufficient  conditions  for  n  =  i*  ,  when  the  optimality 
criterion  is  long-run  average  net  benefit.  This  approach  was 
extended  to  M/M/c  and  more  general  exponential  systems  by  Knud- 
sen(1972)  and  Knudsen  and  S t i dham( 1 976) .  Systems  with  more  gene¬ 
ral  arrival  process  or  service- time  distribution,  with  attention 
restricted  to  cr i t i cal -number  policies,  have  been  studied  by 
Adler  and  Haor(1969),  Simonov i ts< 1976) ,  Bal achandran  and  Schae¬ 
fer  (1979),  and  Rue  and  Rosensh i ne < 1 981 ) . 

Of  course,  for  discounted  problems  one  possible  technique 
for  finding  i*  is  simply  to  apply  value  iteration  as  a  numeri¬ 
cal  algorithm  to  the  optimality  equation  ( 1 ) . *  Although  not  a 

#The  state  space  must  first  be  truncated  to  a  finite  set,  in 
order  for  the  algorithm  to  be  finite. 

particularly  fast  algorithm  by  itself  (especially  when  is 
close  to  zero),  value  iteration  can  be  made  to  converge  quite 
rapidly  with  the  incorporation  of  bounds,  extrapolations,  elimi¬ 
nation  of  subopt imal  actions,  and  transformations.  (See,  e.g., 
van  Nunen  and  Wesse 1 s( 1 979)  for  a  survey  of  the  different  va¬ 
riants  of  value  iteration.)  Policy  iteration  ("Howard's  algo¬ 
rithm")  is  another  alternative.  A  variant  of  policy  iteration, 
which  restricts  attention  to  cr i t i cal -number  policies  and  ex- 
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ploits  the  special  structure  of  the  optimality  equation  tor 
a dm i ss i on -con tr ol  problems,  was  developed  by  van  Nunen  and  Puter- 
man ( 1 980) , < 1 982) .  Wijngaard  and  St i dham( i 983)  have  developed  an 
efficient  algorithm  for  a  class  of  Markov  decision  processes  that 
includes  the  a dm i ss i on -con trol  problem. 

2.  Suhmodulaciix  and  moixotariLc.  policies.*.  We  see  from  (2)  that 
our  maximization  problem  i s  of  the  form 

f ( i )  =  max  g ( i  , a)  ,  <  6 > 
a 

where  in  this  case  g(i,a)  *=  a  r  +  V<  i+a)  ,  The  theory  of 
submodular  (and  supermodu 1 ar )  functions  (Topk i s( i 978) )  provides  a 
set  of  tools  for  showing  monotonicity  of  the  optimizing  action  in 
problems  of  this  form.  A  function  g(i,a)  is  called  sahmodulai: 
(snpedmoriulan)  in  (i,a)  if  g(i,a^)  -  g < i  , a )  is  non-increasing 
in  i  ,  for  al 1  a/  >  a  .  If  a( i )  is  defined  as  the  (smallest) 
maximizing  action  in  (6),  then  it  is  easy  to  see  that  a<  i  )  is 
non-increasing  (non-decreasing)  in  i  if  g(i,a)  is  submodular 
( supermodu 1 ar ) .  (Symmetric  statements  hold  for  minimization 
problems.)  In  the  present  problem,  submodularity  of  g(i,a)  = 
a  r  +  V ( i + a )  foil ows  directly  f r om  c on cay i t y  of  M( i )  .  This  i s 
often  the  case  in  queueing-control  problems. 

3.  Ex±£iQsions  o±  iha  Induciiua  approach-*.  Variants  of  the  induc¬ 
tive  approach  can  be  used  to  prove  mono ton i c i ty  of  an  optimal 
admission-control  policy  for  systems  with  multiple  servers  or  a 
state-dependent  service  mechanism,  non-linear  (convex)  holding 
cost  rate  h(i)  ,  and/or  random  rewards  (Lippman  and  Stid- 
ham< 1 97?) > ,  with  a  general  i n tenarr i val - t i me  distribution,  batch 
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arrivals  with  random  baton  ci^e,  or  mixed-Erlang  service-time 
distribution  ( St i dham( 1 978) ,  Langen < 1 982) )  ,  with  random  environ¬ 
ments,  including  dependent  inter  arrival  times,  -fixed  time  hori¬ 
zons,  non-stat i onary  arrival  process,  multiple  job  types,  or 
partially  observable  processes  (Helm  and  Wal dmann ( 1 983) ) ,  with 
general  input  and  output  process  and  continuous  state  variable 
(Johansen  and  St i dham< 1 980 ) ) ,  and  with  charging  of  rewards  and 
costs  at  departure  (rather  than  arrival)  instants  (Johansen  and 
St i dham( 1 984) ) .  For  more  details  about  other  these  and  other 
single-facility  admission-control  models,  see,  e.g.,  Stidham  and 
Prabhu ( 1974) ,  Johansen  and  St i dham< 1980) ,  St i dham( 1 984) . 

2 .  £an±cn±  In  a  SinQla^Saniian  Guana 

Our  next  mode  1  is  for  a  single  M/M/1  queue  with  control  of 
service.  The  model  is  illustrated  below  in  Figure  5.  As  we 
shall  see,  it  is  in  some  sense  "dual"  to  the  admission-control 
model  of  the  previous  section,  and  as  such  differs  superficially 
from  service-rate  control  models  in  the  literature  (Cra- 
bi 1 1 (1972) , (1974) ,  Sabe t i ( 1 973 ) ,  L i ppman ( 1 975) ,  Jo<1982>).  The 
formulations  can  be  shown  to  be  equivalent,  however,  by  a  simple 
transformation  (see  Remark  4  below). 


Figure  5.  M/M/1  Queue  with  Control  of  Service. 


Once  again  jobs  arrive  from  a  Poisson  process  with  mean 
arrival  rate  X  ,  but  now  all  jobs  enter  the  system.  An  enter¬ 
ing  job  goes  immediately  into  service  if  the  server  is  free; 
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otherwise,  it  joins  the  (in-finite-capacity)  queue.  The  single 
server  performs  (potential)  services  according  to  an  exponential 
distribution  with  mean  service  rate  jjl  .  Control  is  exercised 
by  "accepting"  or  "rejecting"  potential  service  completions  when 
they  occur.  If  a  potential  service  completion  is  accepted,  then 
a  customer  departs  from  the  system  and  a  cost  c  1  0  is  in¬ 
curred.  If  it  is  rejected  then  no  cost  is  incurred  and  the 
system  state  remains  unchanged.  Alternatively,  in  the  case  of 
rejection  we  can  think  of  the  customer  in  service  being  "fed 
back"  to  receive  another  exponentially  distributed  service,  as 
illustrated  in  Figure  5.  As  before,  there  is  a  holding  cost  h 
per  unit  time  per  job  in  the  system.  Future  costs  are  conti¬ 
nuously  discounted  at  rate  e(  >  0  and  the  objective  is  to 
roinj.roii£  the  total  expected  discounted  cost  over  an  infinite 
horizon. 

Define  V< i ) :=  minimum  total  expected  discounted  cost  over 
an  infinite  horizon,  starting  from  state  i  .  Then  this  optimal 
value  function  satisfies  the  optimality  equation  (  i  )  : 

V(  i  )  =  [  h-i  +  ^U(i  +  1)  +  yttmi  n(c  +  i -1  )  ,k)<  i  )  >3/0  +  fx  +  <x  )  ,  (7) 

where  'v'(-l)  =^(0)  .  (Equivalently,  assume  that  all  potential 
service  completions  in  state  0  are  rejected.)  The  "duality" 
with  equation  (1)  for  the  admi ssi on -control  problem  is  obvious. 
The  arguments  for  the  validity  of  (7)  —  both  rigorous  and 
heuristic  --  parallel  those  for  (1).  An  alternate  version  of 
the  optimality  equation,  equivalent  to  (7),  is 

V<  i  )  =  [  h-i  +  \\K  i  +  1 )  +  /4  V(  i  ) 
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(8) 

+  yuminCc  -  ['J<i)-V(i-l)],0}]/(^  + /*•  +  «<)  , 

from  which  it  -follows  that  an  optimal  service  policy  will  be 
monotonic  if  W(  i  )  —  M < i — 1 >  is  non -decreasing  in  i  ,  that  is, 
if  i  )  is  convex.  In  this  case,  the  optimal  policy  will  be 
characterized  by  a  critical  number  i*:=  max < i :V< i > -V< i -1 >  <  c>  . 
The  quantity  V<  i  )  -  V<i-I>  is  the  benefit  (in  terms  of  expected 
discounted  future  cost  savings)  of  a  service  completion  in  state 
i  .  We  should  accept  a  potential  service  completion  if  and  only 
if  this  benefit  is  at  least  as  great  as  the  service  cost  c  , 
that  is,  if  and  only  if  i  >  i#  .  * 


*Our-  convention  is  to  accept  a  service  completion  in  case  cost 
exactly  equals  benefit. 


Just  as  in  the  admission-control  model ,  one  can  prove  that 
V ( i )  is  convex  by  value  iteration,  and  in  the  process  also  prove 
optimality  of  a  monotonic  policy  for  the  n-stage  problem.  The 
key  step  is  to  show  that  convexity  of  a  function  g( i )  implies 
convexity  of  the  function 

f  <  i  )  :  =  m  i  n  <  c  +  g  (  i  - 1 )  ,  g  (  i  )  >  .  (  9 ) 

The  proof  exactly  parallels  the  proof  of  concavity  of  f < i >  when 
f  < i )  is  defined  by  (5)  with  g( i )  concave. 

Ramacks- 

3.  Dp±lmall±y  oi  a  £ ull-saciilca  policy-  Under  an  optimal  policy 
with  critical  number  i*  ,  the  states  i  =  0 , 1 , . . . , i *-l  are 
transient.  In  other  words,  after  the  system  first  reaches  state 
i*  ,  there  will  always  be  at  least  \*  jobs  present.  It  may 
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seem  strange  that  an  optimal  pol icy  might  not  serve  unless  there 
are  at  least  a  certain  minimum  number  of  customers  are  present. 
In  -fact,  when  the  criterion  is  long-run  average  return,  So- 
bel<1932)  has  shown  --  by  a  different  approach  and  for  much 
more  general  systems  than  the  one  under  consideration  here 
that  i#  =  1  .  In  other  words,  a  " f u 1 1 -serv i ce "  policy  is  opti¬ 

mal  :  service  should  take  place  whenever  at  least  one  job  is  in 
the  system.  But  for  discounted  problems  it  is  possible  to  have 
i#  >  1  :  if  the  discount  rate  is  large  enough,  the  savings  in 

future  holding  costs,  after  discounting,  may  not  be  large  enough 
in  some  states  to  offset  the  service  cost,  which  is  incurred  now. 

4 .  Cnnttol  of  sec.ui.ce.  cafe  with  one  oc  mace  f easihle  ualues^.  I  n 
most  of  the  literature  on  service  control,  the  decision  maker  has 
the  option  of  selecting,  at  each  point  in  time,  a  service  rate 
0  from  a  feasible  set  A  ,  which  may  be  a  finite,  countably 
infinite,  -or  uncountable  set  <e.g.,  an  interval  EO,/*.]  ).  When 
service  rate  V  is  in  effect,  a  cost  is  incurred  at  rate  c <  X  ) 
per  unit  time.  (See,  e.g.,  Crab il 1 < 1 972) ,< 1 974) ,  Sabe t i < 1 973) , 
L i ppman < 1 975) ,  Jo<1982).)  By  contrast,  in  the  present  model 

potential  services  take  place  at  the  constant  rate  and  can  be 

accepted,  at  a  lump-sum  cost  c  ,  or  rejected,  at  no  cost.  There 
is  an  equivalence  between  the  two  types  of  mode  1  ,  which  may  be 
seen  as  foil ows . 

First,  our  mode  1  is  equivalent  to  one  in  wh i ch  the  dec i s i on 
maker  must  continuously  choose  between  serving  at  rate  jx  , 
incurring  service  cost  continuously  at  rate  c <  w.  )  !  =  c  ,  or 
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serving  at  rate  0  <i.e»,  not 
rate  0  -  For  f  w  i  t  h  c  ( fx. ) 
equation  (?)  can  be  rewritten 


serving),  incurring  service  cost  at 
de-fined  in  this  way,  the  optimality 
as 


V<  i  )  =  C  h  •  i  +  \  +1 ) 


+  mi  n{c(/t  )  +  fJL  V  <  i  - 1  )  ,  jx\>  <  i  )  >  ]  /  <  \  +  /t  +  oi  ) 


(10) 


which  is  the  optimality  equation  -for  the  latter  problem.* 


*As  in  the  admission-control  problem,  we  have  uni  form i zed  the 
transitions  by  imagining  that,  even  when  the  service  rate  is  0  , 
the  server  continues  to  per -form  "fictitious"  services  at  rate 
yu-  .  These  fictitious  services  have  no  effect  on  the  state  or 
costs,  so  that  the  resulting  Markov  decision  process  is  equiva¬ 
lent  to  one  in  which  only  the  "real11  transitions  (arrivals  and 
service  completions)  are  considered.  (See  L i ppman ( 1 975) ,  Ser~ 
fozo( 1979) . ) 


To  see  how  the  case  of  two  or  more  feasible  service  rates 
can  be  handled,  suppose  our  accept/reject  model  is  modified  as 
follows.  There  are  k  independent,  parallel  servers,  the  j - th 


of  which  serves  according  to  an  exponential  distribution  with 


mean  rate 

^  ,  where 

t 

zx  = 

Pot en t i al  ser v i 

c  e 

comp  1 e- 

tions  by 

server  j 

c  j 

J 

=  1 ,2, . . 

.  ,k) 

can  be  accepted, 

at 

a  cost 

c  ,  or 

■j 

rejected, 

at 

0  cost 

■ 

Assume  the  c  ys 
j 

ar  e 

non-de- 

creasing  in  j  .  Then  the  term  in  the  optimality  equation  (1) 
that  involves  minimization  will  be  replaced  by 

k 

£  i'  •  mi  n<c  +  i  -1  >  ,  V<  i  >  >  . 
j  =  1  j  j 

An  optimal  policy  will  accept  service  completions  in  state  i 

from  servers  j  =  1 , 2 , . . . , j *( i )  ,  and  reject  those  from  servers 

j  “  j  *<  i  >  +  1  ,  .  .  .  , k  ,  where  j*(i):=  max(\i  :  U<  i  )-U<  i -1  )ic  >  .  The 

j 

inductive  proof  that  U< i )  is  convex  goes  through  without 
change.  Thus  the  j*(i)/s  are  non-decreasing  in  i  :  the  more 
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jobs  in  the  system,  the  more  servers  should  be  "on".  This  re¬ 
sult,  which  is  of  interest  in  its  own  right,  can  be  used  to 
establish  optimality  o-f  a  monotonic  policy  -for  a  single-server 
system  in  which  the  server  can  be  operated  at  any  rate 

%  €  [  0 ,  ]  ,  at  a  cost  per  unit  time  c<  ,  where  c  <  Jr  )  is  a 

convex,  non -dec  re  as  i  ng  -function,  with  c(G>  =  0  .  One  simply 

approximates  c  (  X  )  by  a  piecewise-1  inear  function  over  k  sub- 

intervals  of  lenqth  i"  ,  .  ^  .with  ^  +  ...  +  J*  =  ju- 

1  k  1  k 

and  c  :=  CcCZlf  )-c(Xy  )  )/  )S  as  the  slope  over  the  j  —  th 
j  j  j-1  j 

subinterval.  Convexity  of  c<  ^  )  implies  that  the  c  are  non- 

j 

decreasing  in  j  ,  so  that  the  above  multi -server  model  applies. 

It  follows  that  the  optimal  service  rate  to  use  in  state  i 

equals  Jf  +  ...  +  if  ,  which  is  non-decreasing  in  i 

1  j  *  <  i  ) 

since  j*(i)  is  non-decreasing  in  i 

The  assumption  that  c(  ^  )  is  convex  is  not  restrictive, 
since  any  non -con vex  c<  %  )  can  be  replaced  by  i ts  lower  convex 
envelope  without  affecting  optimal  policies.  See,  e.g., 
Crab i 1 1 < 1 972) ,  Jo<1982>  for  details.  Intuitively,  the  reason  is 

that  any  rate  if  that  belongs  to  an  interval  where  c<  if)  is 
non-convex  can  be  achieved  at  a  lower  cost  by  mixing  two  rates, 
one  below  and  the  other  above  Jf 


5 .  Extensions 
policy  can  be 
analysis  for 

<  Crab i 1 1 ( 1972) > , 

<  L i ppman  < 1 975) ) , 


and  gananaiizaiions-  Optimality  of  a  monoton ic 
proved  by  extensions  of  the  above  inductive 
systems  with  state-dependent  arrival  rates 
non-linear  (convex)  holding  costs 

general  service- time  distribution  with  work- in- 
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system  as  the  state  variable  (M i tche 1 1 ( 1 970 ) ,  Dosh i ( 1 976) ) , 
and  phase-type  service-time  distributions  (Jo  and  St i dham( 1 983) ) . 
Papers  treating  the  service-control  problem  by  other  techniques 
include  Schassberger ( 1 973) ,  Gal  1  i sch ( 1 978) ,  and  Stidham  and 
Weber ( 1 984) . 

6.  Switching  costs  and  hysieceilc  pinltcies.-.  In  some  problems 
there  may  be  a  lump-sum  cost  associated  with  switching  the  ser¬ 
vice  rate  from  one  value  to  another,  in  addition  to  the  cost 

rates  associated  with  serving  at  various  rates.  (See  Lu  and 
Serf ozo( 1 931 )  and  the  references  cited  therein.)  This  switching 
cost,  for  example,  could  he  proportional  to  the  difference  be¬ 
tween  the  old  and  new  service  rates.  7 he  optimal  service  rate 
to  select  at  an  observation  point  now  depends  on  the  rate  V 

currently  in  use  as  well  as  the  number  of  jobs  i ,  so  that  the 

appropriate  state  variable  is  ( i , V  )  .  An  inductive  analysis 

can  be  used  to  show  that  an  optimal  policy  is  hysiecaiic ,  which 
means  that  it  is  characterized  by  a  sequence  of  pairs  of  conical 

limits ,  ( (  "V  f  v  )  ,  i  =0 , 1  , 2 ,  .  .  .  >  ,  such  that  for  each  state 

i  i 

( i , V )  the  service  rate  should  be  (1)  adjusted  upward  to  V  , 

i 

if  V  <  V  ;  (2)  adjusted  downward  to  V  ,  if  V  >  v  ; 

i  i  i 

and  (3)  left  unchanged  if  V  1  1/  L  V 

i  i 

3 .  Content  o£  Acciuals  and  Sendees 
in  Cycles  and  Secies  o£  .Queues 

Our  first  model  (Weber  and  St i dham( 1 983) )  for  control  of  a 
network  of  queues  is  for  a  cycle  of  m  queues,  in  which  a  job 
that  completes  service  at  node  (queue)  i  goes  to  node  i+1 


293 


(We  identify  node  m+ 1  as  node  I  .  >  The  system  under  considera¬ 
tion  is  illustrated  in  Figure  6.  Jobs  -from  outside  the  system 

enter  node  i  at  mean  rate  X  according  to  a  Foisson  process, 

i 


Figure  6.  Cycle  of  Queues  with  Control  of  Service. 


which  is  not  subject  to  control.  There  is  a  single  exponential 

server  at  node  i  who  performs  potential  services  at  mean  rate 

u.  .  A  potential  service  comp  1  e  t  i  on  may  be  accepted,  at  a  cost 

'  i 

ci  (which  may  be  negative),  or  rejected,  at  0  cost.*  The 


*Cf .  the  single-facility  service-control  model  in  Section  2. 
Continuous  control  of  service  rates  and  more  than  one  feasible 
rate  can  be  handled  by  the  same  transformations  and  extensions  as- 
used  there.  See  Remark  4. 


number  of  jobs  in  node  i  is  denoted  by  x  and  a  state  of  the 

i 

system  by  the  vector  x  =  (x  . x  )  ,  with  x  10  ,  i  = 

1  m  i 

1 , . . . ,m  .  While  in  state  x  ,  the  system  incurs  holding  cost  per 

unit  time'h(x)-Zih<x),  where  each  function  h  (x  >  is 

i  i  i  i 

non-negat i ve  and  convex  (but  not  necessarily  non-decreasing). 

Future  costs  are  continuously  discounted  at  rate  o<  >  0  and  the 

objective  is  to  minimize  the  total  expected  discounted  cost  over 

an  infinite  horizon. 


The  two  types  of  state  transitions  will  be  denoted 

x  - —  >  A  x  :  =  x  +  e  , 
i  i 

corresponding  to  an  arrival  at  node  i  ,  and 

x  —  >  T  x :  =  x  -  e  +  e  , 
i  i  i  + 1 
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corresponding  to  an  accepted  service  completion  at  node  i  and 

resulting  transfer  to  node  i+1  .  (Here  e  denotes  an  m-vector 

i 

with  0's  in  every  component  except  the  i-th,  which  equals  1  .) 

The  system  is  observed  at  every  arrival  and  potential  service 

completion,  so  that  observation  points  occur  exponentially  at 

rate  A  :=  2(  )  .  Define  k)(x):  =  minimum  total  expected 

i  i 

discounted  cost  over  an  infinite  horizon,  starting  from  state 
x  .  Then  U(x)  satisfies  the  optimality  equation 

m 

V<x)  ■  C  h(x>  +  2  ^  x> 

i  =  1  i  i 

m  <  1  1) 

+  2  jlt.minCV(x),c  +  V(T  x)>]/(A  +  eO  , 

i=l  '  i  i  i 

where  it  is  understood  that  the  minimization  operator  selects 

M(x>  if  x  =  O  .  (The  arguments  for  the  validity  of  (11) 

i 

parallel  those  for  the  single-facility  model  of  Section  2.)  We 
can  rewrite  (11)  in  the  equivalent  version 


V<x> 


m  m 

h  ( x )  +  2  "X  "vKA  x)  +  2  A  V<x) 

i  =  1  i  i  i  =  1  i 

m  (12) 

+  S  /tmin{0,  c  -  [V(x)-M(T  x)]>/(A  +  *<>  . 
i=l  i  i  i 


The  quantity  lv)(x)-U(T  x)  can  be  interpreted  as  the  benefit  of  a 

i 

service  completion  at  node  i  in  state  x  .  We  should  accept  a 
potential  service  completion  at  node  i  if  and  only  if  this 
benefit  is  at  least  as  great  as  the  service  cost  c 

i 

Weber  and  St i dh am( 1 983)  showed  that  an  optimal  policy  has 
the  following  property  (analogous  to  simple  mono tonicity  in  sin¬ 
gle-facility  probl ems) : 
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££±ac.  a  se.c.uic.fi.  complelina  a±  aoda  i  a£±ec.  a  j.oh  is 
icaasiacari  £cam  node  i  ±o  node  i±l  1+  ihe.  npiimal  aacuite  caie 
a±  node.  i  does  aoi  iacnaasa  aad  ±ha  apiimal  aaniilca  caias  a± 
o£haa  andaa  do  nai  daccaasa^. 

To  establish  this  property,  it  suffices  to  show  that  the  benefit, 

M(x)  -  (AT  x)  ,  of  transferring  a  job  from  node  i  to  node  i+1 
i 

does  not  increase  as  another  job  is  transferred  from  i  to 
i+1  ,  and  does  not  decrease  as  another  job  is  transferred  from 
node  j  to  node  j+1  ,  j  ^  i  .  In  other  words, 


(Ax) 

-  (AT 

i 

x) 

i 

1 

(AT 

i 

X  > 

i 

-  (AT 

i 

T 

i  i 

x)  , 

i 

(13) 

U(x) 

-  (AT 

X) 

£ 

(AT 

x> 

-  (AT 

T 

X  )  ,  j  5^  i  . 

(14) 

i  j  i  j 

It  can  be  shown  (by  mowing  one  job  all  the  way  around  the  cycle) 
that  (14)  implies  (13),  so  it  suffices  to  prove  (14). 

We  call  a  function  (Ax)  satisfying  (14)  muliimodulac , 
following  Ha j ek ( 1 983) ,  who  introduced  the  concept  in  a  different 
context.  In  a  sense,  mu  1 t i modu 1 ar i ty  is  a  mu  1  t i -d i mens i on al 
analogue  of  submodularity.  Weber  and  St i dham( 1 983)  prove  that 
the  optimal  value  function  (Ax)  is  multimodular  by  a  value- 
iteration  induction,  the  key  step  of  which  (cf.  the  single¬ 
facility  models)  is  to  show  that  multimodularity  is  preserved  by 
transformations  of  the  form 

f (x)  =  min(c  +  g(T  x),  g(x)J  . 

k  “  k 


296 


Applications.!  Sanies  a£  Queuas  with  Conical  a£  Acciiials  andZoc 
S£LUiC£S- 

These  results  can  be  applied  to  a  series  of  queues 

<  i  =1  ,  2 ,  .  .  .  ,m-l  )  ,  with  control  o-f  the  < Poisson)  arrival  process  at 

the  first  node,  by  adding  to  the  series  a  dummy  node  m  ,  with  no 

holding  cost  and  an  infinite  supply  of  jobs,  and  letting  that 

node  receive  all  output  from  node  m-1  and  generate  all  input  to 

node  1  .  Accepting  or  rejecting  service  completions  at  the 

dummy  node  corresponds  to  accepting  or  rejecting  arrivals  to  the 

first  node  in  the  series.  The  service  cost  c  at  the  dummy 

m 

node  is  the  negative  of  the  reward  r  earned  when  an  arriving 
job  is  accepted  at  node  1  . 

The  monotonicity  result  referred  to  above  implies  that  the 
benefit  of  accepting  an  arriving  job  does  not  decrease  as  another 
job  is  transferred  from  any  node  j  in  the  series  to  node  j+1  , 
or  <by  combining  a  sequence  of  such  moves)  as  a  job  is  removed 
from  any  node  j  in  the  series.  Thus  an  optimal  admission- 
control  policy  will  be  more  likely  to  accept  if  either  of  these 
two  types  of  state  change  is  made,  which  generalizes  the  result 
in  Section  1.  Although  the  present  model  implicitly  assumes  that 
the  service  rate  at  each  of  the  nodes  in  series  is  also  con¬ 
trollable,  the  results  apply  to  a  system  with  fixed  service 
rates,  as  long  as  the  marginal  holding  costs  do  not  increase  from 
node  i  to  node  i+.t  ,  i  =  1,2,..., m-1  .  For  in  this  case  the 
problem  with  fixed  service  rates  is  equivalent  to  one  in  which 
service  completions  can  be  accepted  or  rejected  but  there  is  no 
cost  of  accepting  a  service  completion;  in  the  latter  problem  it 
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will  always  be  optimal  to  accept  all  service  completions,  since 

they  move  a  job  to  a  cheaper  node  at  no  cost.  Note  that  this 

ordering  o-f  the  marginal  holding  costs  implies  that  each  holding 

cost  function  h  <x  )  is  non-decreasing,  since  the  marginal 

i  i 

holding  cost  at  node  m  is  identically  zero.  See  Weber  and 
St i dham< 1 983)  for  further  discussion. 

The  results  of  Weber  and  St i dham< 1 983)  may  be  compared  to 
those  of  Lazar (1983),  who  also  analyzes  control  of  the  arrival 
rate  to  the  first  queue  in  a  series  of  queues.  Lazar  studies  a 
steady-state  version  of  the  problem,  in  which  the  objective  is  to 
maximize  expected  steady-state  throughput,  subject  to  a  con¬ 
straint  on  the  total  expected  response  time  (time  to  pass  through 
all  nodes)  of  a  job.  By  a  Lagrange -mu  1  t i p 1  i ers  argument,  this 
problem  can  be  seen  to  be  equivalent  to  the  problem  of  choosing 
an  arrival  rate  to  the  first  node  that  maximizes  a  weighted  sum 
of  the  arrival  rate  <i.e.,  the  throughput  of  the  system)  and 
minus  the  steady-state  expected  number  of  customers  in  the  sys¬ 
tem.  The  latter  problem  is  clearly  equivalent  to  the  problem  of 
max i m i z i n  g  the  1 ong-ru n  a v e  r  age  net  benefit  in  a  sys t  em  w i t  h 
fixed  reward  per  admitted  customer  and  linear  holding  costs  a t 
each  queue,  with  the  same  holding-cost  coefficient:  a  problem 

that  belongs  to  the  class  considered  in  Weber  and  St i dhamC 1 983) . 

Lazar's  analysis  shows  that  an  optimal  control  policy  has  a 
cr i t i cal -number  form  with  respect  to  the  total  number  of  jobs  in 
the  series:  "end-to-end"  control  is  optimal.  This  result  may 
seem  to  contradict  those  in  Weber  and  St i dham< 1 983) ,  in  which 
optimal  policies  can  < and  generally  do)  have  a  more  general 
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structure.  But  in  -fact  there  is  no  contradiction,  since  Lazar 
derives  his  result  by  -First  considering  the  single-facility  Nor¬ 
ton  equivalent  to  the  series,  for  which  (of  course)  an  optimal 
pol  icy  is  a  function  of  the  total  number  in  the  system.  There¬ 
fore,  by  the  way  in  which  Lazar  has  formulated  his  problem,  he 
has  from  the  beginning  restricted  attention  to  policies  based  on 
the  total  number  of  customers  in  the  series. 

Q±h£C  N£±wnck.=Con±n:ol  tlndals^. 

Dav is( 1977)  considered  two  exponential  servers  (with  mean 

rates  fx,  and  jx  )  in  parallel,  each  with  its  own  queue,  and  a 

renewal  arrival  process  --  that  is,  i.i.d.  interarrival  times 

distributed  as  a  random  variable  T  .  The  system  control  1 er  may 

reject  (a~CO  an  arriving  job,  admit  it  to  queue  1  (a=l),  or  admit 

it  to  queue  2  (a=2) ,  based  on  the  state  x  =  (x  ,x  )  at  the 

1  2 

instant  of  arrival ,  where  x  :=  number  of  jobs  at  queue  j  ( i n- 

j 

eluding  the  one  in  service,  if  any),  j  =  1,2  .  Figure  7 

illustrates  the  model .  An  admitted  customer  generates  a  (deter- 

Figure  7.  Control  of  Admission  and  Routing  to  Two  Parallel 
Queues 

mini stic)  reward  r  .  There  is  a  holding  cost  rate  h  (x  )  per 

J  j 

unit  time  while  x  jobs  are  at  queue  j  ,  where  h  (*)  is  a 

j  j 

convex,  non-decreasing  function,  j  =  1,2  . 

An  inductive  proof  based  on  value  iteration  shows  that  an 

optimal  admission/routing  policy  (a(x)>  is  monotonic  for  this 

problem,  in  the  sense  that  a(x)  =  0  impl ies  a(x+e  )  =  0  ,  j  = 

j 
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1,2  ;  in  other  words,  the  rejection  region  R:=  <>::  a(x)  =  0 >  is 
an  inc.n.£asing  sa±.  The  induction  also  shows  that  the  "switching 
curve"  is  mono tonic:  i f  admitting  to  queue  2  (queue  1  )  is 

preferable  to  admitting  to  queue  1  (queue  2)  in  state  x  , 

then  it  will  remain  so  in  state  x  +  e  (x  +  e  )  .  Finally,  an 

1  2 

additional  property  of -the  rejection  region  is  demonstrated  by 
the  induction:  as  we  move  closer  to  the  switching  curve,  we  are 

more  likely  to  reject  a  job.  To  illustrate  the  application  of 
this  property,  note  that  each  of  the  rejection  regions  illus¬ 
trated  in  Figure  8  below  (for  the  case  of  two  symmetric  queues) 
is  an  increasing  set,  but  8(a)  and  8(b)  violate  this  property 
and  thus  cannot  be  rejection  regions  for  an  optimal  policy.  All 

Figure  8.  Examples  of  Increasing  Sets  for  Paral 1 e 1 -Queue  Problem 

these  properties  of  an  optimal  pc«l  icy  follow  from  verifying  that 
the  optimal  value  function  is  concave  in  each  argument,  submodu- 
1  ar  ,  and  satisfies  a  third  condition.  The  three  conditions  taken 
together  constitute  the  analogue  of  mu  1 t i modu 1 ar i  ty  for  maximiza¬ 
tion  problems  in  two  dimensions.  Control  of  routing  with  paral¬ 
lel  queues  was  also  considered  by  Fame  1  1  (  1  976)  ,  W  i  nston  ( 1  977)  , 
Weber (I  978),  and  Ephremides,  Uaraiya,  and  Wal r and( 1 979)  among 
others. 

Ghone i m( 1 980 )  (see  also  Ghoneim  and  S't  i  dham(  1  983) )  studied 

two  exponential  servers  in  series  (with  mean  service  rates  fx 

and  jx  ),  each  with  an  infinite-capacity  queue.  Arrivals  to 
2 

queue  j  are  from  a  Poisson  process  with  mean  rate  \  ,  j  = 

j 

1,2  .  Jobs  arriving  to  queue  1  must  go  on  to  queue  2  after 
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■finishing  service  at  server  1  .  Jobs  arriving  to  queue  2 

leave  the  system  after  finishing  service  at  server  2  .  (See 

Figure  9.)  The  model  thus  describes,  for  example,  a  simple 
communication  system  consisting  of  two  channels  in  series  with  a 
combination  of  local  and  long-distance  traffic. 

Figure  9.  Control  of  Admission  to  Two  Queues  in  Series. 

With  the  same  reward  and  cost  structure  as  for  the  paral 1  el -queue 
model  .just  discussed,  an  induction  based  on  value  iteration 
establishes  that  the  same  three  properties  hold  for  the  optimal 
value  function.  Thus,  for  example,  the  optimal  rejection  region 
for  jobs  arriving  to  queue  1  i s  an  increasing  set.  These  proper¬ 
ties  also  rule  out  certain  increasing  sets  as  candidates  for  the 
optimal  rejection  region,  namely  those  whose  boundaries  have 
horizontal  segments  of  length  greater  than  one. 

A  typical  optimal  rejection  region  (from  a  numerical 
example)  has  the  shape  shown  in  Figure  10.  (This  shape  for  the 

Figure  10.  Typical  Optimal  Rejection  Region  for  Two  Queues  in 
S'er  i  es 

rejection  region  is  characteristic  of  both  discounted  and  undis- 
counted  problems.)  It  may  be  instructive  to  compare  this  rejec¬ 
tion  region  to  that  implied  by  one  of  the  flow-control  rules 
proposed  in  the  communications  literature.  "End-to-end"  control 
( Lazar ( 1 983) )  suggests  putting  an  upper  bound,  k  ,  on  the  total 
number  of  jobs  (messages)  in  a  particular  path  in  the  network. 
The  corresponding  rejection  region  has  a  straight-line  boundary, 

x  +  x  =  k  .  By  contrast  the  boundary  of  an  optimal  rejection 
1  2 
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region,  as  illustrated  in  Figure  10,  is  non-linear  and  non- 
symmetric.  Numerical  examples  (cf.  Abde  1  -Gawad(  1  ?S4)  )  have  shown 
that  an  optimal  policy  can  yield  benefits  up  to  15'/.  larger  than 
those  from  the  best  end-to-end  control.  Implementing  an  optimal 
control  requires  that  the  system  controller  keep  track  of  the 
number  of  jobs  at  each  node  (queue)  in  a  path,  rather  than  just 
the  total  number.  So  additional  bookkeeping  overhead  would  be 
requ i red . 

Hajek(1982)  has  considered  a  general  two-node  model  that  in¬ 
corporates  many  of  the  features  of  both  the  parallel  and  series 
queue  models  (but  not  the  option  of  accepting  or  rejecting  arriv¬ 
ing  jobs).  In  Hajek's  model,  queues  1  and  2  receive  Poisson 

arrivals  at  rates  "X  and  X  ,  respectively.  A  third  stream  of 

1  2 

Poisson  arrivals  at  rate  X  can  be  routed  to  either  queue.  The 

stations  have  fixed  exponential  servers  with  rates  u.  and  yu. 

1  2 

and  a  thi-rd  exponential  server  with  rate  fx  that  can  be  as¬ 
signed  to  either  queue;  jobs  whose  service  is  completed  by  these 
servers  leave  the  system.  There  are  two  additional  exponential 

servers,  with  rates  V  and  V  ,  the  first  of  which  serves 

1 2  21 

queue  1  and  sends  jobs  to  queue  2,  the  second  of  which  serves 
queue  2  and  sends  jobs  to  queue  1  .  Service  completions  by 

these  servers  can  be  "accepted"  or  "rejected";  the  jobs  arriving 
at  rate  X  are  to  be  routed  to  one  or  the  other  of  the  queues; 

and  the  server  with  rate  yU.  i  s  to  be  assigned  to  one  or  the 

other  of  the  queues.  All  these  decisions  are  to  be  made  dyna¬ 
mically  as  a  function  of  the  number  of  jobs  in  the  two  queues. 
Hajek  uses  an  inductive  proof  to  establ i sh  the  existence  of  a 
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monotonic  switching  curve,  on  which  all  these  decisions  can  be 
based.  His  analysis  accommodates  convex  holding  costs  at  each 
queue  and  costs  associated  with  each  switching  decision. 

So  tar  little  progress  has  been  made  in  characterizing  or 
computing  optimal  control  policies  tor  more  complicated  networks 
than  those  we  have  discussed.  As  tar  as  I  know,  the  only  suc- 
cesstul  analysis  ot  a  network  with  more  than  twio  nodes  is  that  ot 
Weber  and  St i dham< 1 983) .  An  essential  feature  ot  their 
cycles/series  model  is  the  absence  of  branching  or  routing 
choices.  As  we  have  seen,  both  branching  and  routing  choices 
have  been  successfully  analyzed  in  the  context  of  two-node  prob¬ 
lems.  But  attempts  to  extend  these  results  to  more  than  two 
nodes  have  failed.  In  particular,  the  three-node,  series-paral¬ 
lel  network  discussed  in  the  introduction  (see  Figure  3)  ap¬ 
parently  cannot  be  studied  by  the  inductive  approach.  (See 
Abde 1 -GawadC 1 984)  for  further  discussion.) 
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SOME  RECENT  ADVANCES  IN  ACTIVITY  NETWORKS 


S.  E.  Elmaghraby 
NCSU 

INTRODUCTION 

This  paper  is  a  subset  of  a  much  larger  monograph  on  Progress  in 

ANs^  that  is  currently  under  preparation.  Here,  we  limit  ourselves 

* 

to  three  specific  areas:  Project  compression  in  DANs  ,  statistical 
§ 

estimation  in  PANs  ,  and  reducibility  of  ANs .  (For  a  more  detailed 
exposition  of  the  acronyms  DANs  and  PANs,  see  the  book  by  Elmaghraby 
[  8].) 

The  CPM  model  [  8  ]  resolved  the  questions  that  first  come  to  mind 
in  DANs,  whose  analysis  is,  fortunately,  of  elementary  nature.  They 
enriched  our  vocabulary  with  such  important  concepts  as:  Critical 
path  (CP),  event  "slack"  and  activity  "float",  earliest  and  latest 
node  realization  times;  etc.  The  two  major  outstanding  problems  are: 
optimal  project  "compression",  and  optimal  "resource  allocation"; 
neither  of  which  can  be  termed  "elementary".  Indeed,  the  latter 
problem,  that  of  optimal  resource  allocation,  is  known  to  be  NP -Complete 
and  the  achievement  of  the  optimum  is  impractical  except  in  the  most 
trivial  of  networks.  This,  in  turn,  signifies  that  the  problem  is 
difficult  to  resolve  by  an  "efficient"  algorithm  for  any  realistic  AN; 
hence  the  rash  of  heuristic  procedures  that  yield  "good"  answers. 


^Activity  Network 

i k 

Deterministic  Activity  Network 

§ 

Probabilistic  Activity  Network 


Fortunately,  the  problem  of  project  compression  Is  amenable  to 
resolution.  Its  significance  resides  in  the  ability  to  specify  the 
most  efficient  utilization  of  investments  in  the  speeding  up  of  a 
project.  Alternatively,  it  serves  to  alert  the  manager  to  the  range 
of  requirements  of  additional  investments  should  he  wish  to  deviate 
appreciably  from  the  ''normal11  flow  of  work  in  the  project. 

The  problems  of  statistical  estimation  are  concerned  with  the 
determination  of  probability  distribution  functions  (pdf)  of  the 
time  of  realization  of  nodes  (*  events)  when  the  durations  of  the 
activities  are  random  variables  (r.v.).  In  addition,  a  host  of  other 
issues  are  raised  relative  to  the  criticality  of  paths  and  activities, 
whose  answer  is  difficult  to  compute,  despite  their  theoretical 
simplicity.  Approximations  and  bounding  techniques  are  used  to  give 
the  analyst  the  insight  desired. 

Interestingly  enough,  the  above  two  classes  of  problems  give  rise 
to  the  third  issue  discussed  in  this  report,  viz.,  the  reducibility  of 
ANs. 

One  final  remark.  The  three  main  sections  of  this  report  are 
almost  independent.  This  may  have  introduced  some  redundancy,  but 
should  facilitate  reading. 
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OPTIMAL  PROJECT  COMPRESSION 


The  mathematical  statement  of  the  problem  runs  as  follows: 

minimize  £  C..(y  .)  (1) 

(ij)eA  1J  ^ 

such  that  the  precedence  constraints  are  respected,  and  the  project 
is  completed  on  or  before  time  Tfl.  Let  t^  denote  the  time  of  realiza¬ 
tion  of  node  i^  Then  if  activity  (ij)eA,  with  the  arrow  in  the  direc¬ 
tion  i.  j_»  we  must  impose  the  restriction 

"ti  +  tj“yij-°  V<i^€A  (2) 

The  completion  time  requirement  adds  the  constraint: 


t..  -  t  >  -T 
1  n  —  fi 


(3) 


Here  we  assume  that  the  "start  node"  is  node  1^,  and  that  the  "terminal1 
node  is  n;  whence  the  set  N  =  {1,2,  n} .  Finally,  the  activity 


duration  y^  is  bound  from  below  by  a  lower  limit  0,  and  from 


above  by  an  upper  limit  u^  >  i.e.,  0  <_  y  _<  u^y  (The 


only  instance  in  which  y^  is  permitted  to  be  0  is  in  the  case  of  "dummy 


activities;  see  ref.  [8  ]  for  a  detailed  explanation  of  the  utility  of 
these  activities.)  It  is  more  convenient  to  re-write  this  double 


inequality  as 


The  mathematical  program  (l)-(4)  has  been  extensively  studied  under 
the  various  manifestations  of  the  individual  activity  time-cost  function 
,  (see  Chapter  2  of  ref.  [8  ]  for  details).  We  devote  the  remainder 
of  this  section  to  the  analysis  of  the  case  in  which  C^(y^j)  is  convex 
decreasing. 

In  passing,  we  mention  that,  to  the  best  of  our  knowledge,  the 
first  treatment  of  convex  cost  functions  was  by  Jewel  [14]  in  1965. 
However,  his  motivation  stenaned  from  PANs,  where  his  objective  was  to 
balance  the  cost  of  project  compression  versus  contract  penalties  and 
bids  by  competitors.  In  particular,  he  addressed  the  following  question: 
A  fixed  project  schedule  must  be  determined  now  despite  uncertainty  in 
activity  durations.  Based  upon  the  difference  between  the  allotted  time 
interval  and  the  "free"  time  needed  by  the  job,  corrective  action  may 
have  to  be  taken  to  stay  within  the  fixed,  predetermined,  schedule. 

The  problem  is  to  determine  that  schedule  that  minimizes  the  expected 
amount  of  extra  effort  expended  to  stay  on  schedule. 

He  assumes  that  if  the  allotted  time  to  an  activity  (ij)  is 
z^t*  tj-t^),  and  if  the  activity  (random)  duration  is  then  the 

cost  function  g(z|Y)  is  convex  in  z  for  every  realization  y  of  Y^  for  all 
activities.  In  Figure  1,  three  sample  curves  are  given,  and  all  differ 
in  the  cost  incurred  if  y  is  actually  less  than  z:  (I)  represents 
resulting  economy  through,  for  instance,  resource  utilization  elsewhere; 

(II)  would  obtain  if  the  committed  resources  are  irretrievable;  and 

(III)  represents  the  need  to  spend  more  effort  because  of,  say,  disposal 
activities . 
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Figure  1.  Three  alternative  curves  of  effort 
required  as  a  function  of  actual 
completion  time  y. 


Convexity  of  g  in  z  for  all  y  guarantees  the  convexity  of  the 
expected  cost  in  the  decision  variable  z,  as  well  as  the  convexity  of 
the  sum  of  costs  over  all  activities,  i.e.,  the  convexity  of 


g(z) 


j  g(z|y)  dF(y) 

0 


where  F(y)  is  the  pdf  of  y,  and  of 


G(Z) 


l  g^Cz^) 


(ij) 


A  typical  curve  of  g(z)  is  shown  in  Figure  2  . 


Figure  2.  Resulting  expected  cost. 


From  this  point  onwards,  the  treatment  is  devoid  of  uncertainty, 
and  is  reduced  to  minimizing  G(Z)  subject  to  the  usual  precedence 
constraints.  Jewell  advocates  approximating  g(z)  linear  segments, 
and  then  applying  Fulkerson’s  algorithm.  (In  the  example  solved  in 
the  paper,  the  function  g(z)  was  quadratic  in  z  and  he  used  a  quadratic 
programming  algorithm.)  More  elaborate  approaches  to  this  problem  are 
the  subject  matter  of  the  remainder  of  this  section. 

We  start  with  the  quadratic  case,  which  can  be  resolved  analyti¬ 
cally,  while  the  general  case  is  approximated  by  linear  splines. 

As  will  become  evident  below,  the  conclusions  are  more  transparent 
if  we  discuss  two  cases  separately:  The  first  assumes  that  the  deriva¬ 
tive  dC/dy  (sometimes  also  denoted  by  C^)  is  continuous  for  y€[fc,°°); 
and  the  second  accepts  discontinuities  at  i  >  0  and  u  >  l. 

Our  discussion  covers  two  trains  of  thought:  The  first  is  to 
achieve  exact  solutions,  and  the  second  is  to  approximate  the  optimum. 
As  will  be  seen,  each  raises  its  own  secondary  problems. 


1 .  EXACT  SOLUTIONS 


Case  l^s  Continuous  Derivative 

Since  C  is  quadratic  with  continuous  derivative  in  the  interval 
[£,<»),  we  may  assume  it,  without  any  loss  of  generality,  to  be  of  the 
form 

C(y)  *  b  +  B(u-y)^  £  <  y  <  u  (5) 

Note  that  C  is  tangent  to  the  line  C(y)  ■  b  at  y  ■  u,  where  C  *  0; 
see  Fig.  3 .  We  may  go  one  step  further  and  simplify  (5)  to 

C(x)  -  b  +  (u  -  x)^ 

by  setting  u  *  u  /IT  and  x  ■  y  /IT  ,  the  "normalized"  values  of  u  and  y, 
respectively.  Henceforth,  we  drop  the  M*M  from  the  u  for  the  sake  of 
simplicity,  since  the  context  reveals  to  which  value  of  the  upper  bound 
reference  is  made.  In  general,  analysis  proceeds  with  the  normalized 
variables  {x^}  to  the  end,  then  it  is  translated  into  the  original 
{y^}-  variables. 

We  introduce  one  mild  assumption  whose  justification  is  easy  to 

establish:  T  is  such  that  no  activity  will  be  at_  its  lower  bound; 
s 

i.e.,  at  the  optimum,  y  >  Jt^ ,  (ij)eA.  (Note  that  if  is  small 
enough,  this  condition  will  be  automatically  satisfied.)  We  shall  refer 
to  it  as  Condition  L. 

In  ref.  [  7  ]  the  following  characterization  of  the  optimum  solution 
is  given.  Let  the  nodes  of  the  network  be  realized  at  times 
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0  »  t-  <  t,  <  t,  <  ...  <  t  »  T  ,  where  t.  corresponds  to  the  kth 

1-1 2~  H~  ~  n  3  \ 

earliest  node*  Let  D(t)  denote  the  sum  of  the  derivatives  that  are 

"in  progress"  at  time  t.  Then  the  given  schedule  of  activities  is 

optimal  iff  D(t)  is  constant  for  all  xe[0,T  ]. 

s 

There  are  two  remarks  to  be  made  about  that  result.  First,  it  was 

proved  by  elementary  variational-type  arguments.  Second,  though  it 

characterized  the  optimum,  it  gave  no  procedure  for  achieving  it.  The 

following  development  pertains  to  these  two  remarks. 

Note  that,  since  is  quadratic  decreasing  in  the  interval 

^ij’Uij]*  t^en  conatant  over  [u^,®],  it  is  convex.  Add  to  this  that 

all  the  constraints  are  linear,  and  the  conclusion  immediately  follows 

that  the  necessary  conditions  of  Kifhn  and  Tucker  for  nonlinear 

programming  are  also  sufficient.  It  is  a  simple  matter  to  verify  that 

these  conditions  translate  directly  into  the  condition  D(t)  •  constant, 

T€[0,TJ;  details  of  the  proof  may  be  found  in  [11].  This  gives  a  more 
s 

direct  proof,  albeit  non-element ary. 

As  to  the  problem  of  algorithmic  solution,  ref.  [11]  also  gives 
such  an  algorithm.  It  is  based  on  the  following  results: 

Proposition  1  The  necessary  and  sufficient  conditions  for 
optimality  stated  above  are  equivalent  to  the  conditions: 


-a  for  i  ■  1 


I  <V  -  V 


0  for  i  #  l,n 
a  for  i  ■  n 


where  d  is  the  (normalized)  reduction  in  activity  (ij),  and 


a  is  some  constant. 
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As  a  preliminary  to  the  next  result,  let  the  "critical  subnetwork  (CSN)M 
denote  the  set  of  longest  paths  in  the  network.  It  is  easy 
to  establish  that  we  may  deal  with  the  optimal  "incremental"  reductions 
{d^j ^ }  at  the  rth  iteration,  in  place  of  the  "total"  reductions  {d^}. 
The  following  two  assertions  ensure  the  solvability  of  the  system  of 
equations  in  the  (d^  }  unknowns. 

Proposition  2  If  the  CSN  contains  K  arcs,  there  shall  be 
K  simultaneous  linear  equations  relating  the  values  of  the 
individual  (incremental)  reductions  {d£^ }  at  the  rth  iteration 
to  the  constant  a^  . 

The  proof  of  this  theorem  rests  on  the  fact  that  if  the  CSN  has 
m0<  n)  nodes  and  K  arcs,  there  are  m-1  Independent  equations  (6)  (the 
first  equation  is  discarded) ,  and  exactly  K-m+1  "fundamental  loop" 
equations,  for  a  total  of  K  independent  equations  in  the  K  unknowns. 

We  shall  refer  to  this  system  of  linear  equations  notationally  as 


A(r) 

a  ^n-l 


where  B  is  a  K  x  K  matrix  of  entries  0,+l;DisaK*l  column  vector 
(  r  ) 

of  {d*  },  and  the  vector  e  .  is  a  vector  of  zeros  except  in  position 

lj  —m-l 

m-1,  where  it  has  entry  1. 

Proposition  3  The  system  (7),  in  the  "incremental"  reduc- 
tions  {d^  },  possesses  a  unique  solution. 


We  are  now  assured  of  the  (unique)  determination  of  d^'as  a  func- 


r  _(r)  A  (r)  (r)  (r) 

tion  of  a  ,  say  d^  *  v^  a 


It  remains  to  determine  a' 
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is  accomplished  by  remarking  that  the  current  CSN  can  be  "compressed1*  un- 
abatedly  until  one  of  the  following  two  eventualities  occur: 

(i)  Another  (currently  non-critical)  path  becomes  "critical",  or 
(ii)  The  specified  duration  Tg  is  reached. 

In  the  first  eventuality  the  CSN  must  be  augmented  by  the  new  path 
(or  paths);  hence,  the  current  system  of  equations  (7)  are  no  longer 
valid  and  must  be  updated.  In  the  second  eventuality,  we  terminate, 
since  the  optimum  is  in  hand. 

Denote  the  permissible  reduction  to  eventuality  (i)  by  a^r^  9  an<* 
the  permissible  reduction  to  eventuality  (ii)  by  a^ .  Then,  clearly, 

a(r)  -  min  (.<'>.  a<r>}.  (8) 


Back  substitution  into  the  expressions  for  d^  yields  the  respective 
values . 

The  suggested  algorithm  may  be  briefly  sketched  as  follows. 

Step  (0)  Set  each  activity  at  its  upper  bound  x  *  u„  V(ij)eA. 
Compute  the  node  realization  times  (t^)  and  define  the  CSN. 


Denote  the  set  of  arcs  in  the  CSN  by  Kv  '  in  iteration  r. 

Step  (1)  Compute  in  terms  of  a^  ;  (ij)^K^r\  by  solving  the 

system  of  linear  equations  B^D^  *  1^. 

Step  (2)  Compute  a^  *  min(a^r\  a^)  if  K  <  A,  and  a^  -  a^ 


if  K  *  A  (i.e.,  if  all  activities  in  A  are  critical). 

Step  (3)  Compute  *  T^r  ^-a^  and  return  to  Step  (1)  with 


r  *  r  +  1  if  T  >  Tg ;  otherwise  halt. 
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Finally,  we  have 

Proposition  4  The  stated  algorithm  yields  the  desired  solution 
in  a  finite  number  of  steps. 

The  suggested  procedure  is  "straightforward"  except  for  two  rather 

complex  operations:  first  the  inversion  of  the  matrix  of  coefficients 

of  (7),  and,  second,  the  determination  of  aj^ .  The  first  is  an  opera- 

3 

tion  of  order  of  complexity  not  exceeding  0(N  );  and  the  second  is  of 
2 

0(N  ) .  (Details  of  capitalizing  in  iteration  r  on  the  availability  of 
the  inverse  of  B^r  ^  from  the  previous  iteration,  as  well  as  the  deter¬ 
mination  of  the  order  of  complexity  of  the  operations  involved  are  given 
in  ref.  [11].) 

Resolving  the  problem  for  a  specified  completion  time  Tg  also  yields 
the  approach  to  obtaining  the  complete  optimal  time-cost  function  for  all 
feasible  Tg  (provided  that  Condition  L  is  satisfied). 

The  procedure  has  been  programmed  in  FORTRAN  4  on  the  NCSU's 
IBM  370/165.  Details  and  documentation  may  be  found  in  [18]. 


Case  2:  Discontinuous  Derivative 


The  cost  function  and  its  derivative  would  then  appear  as  shown  in 
Figure  4  .  We  shall  refer  to  the  derivative  diagram  as  the  kilter 
diagram  (KD) .  In  it  we  distinguish  four  regions: 


Region  F  (for  "float"): 

Region  E  : 

Region  C  (for  "compressible") : 
Region  L  (for  "low  bound") : 


u  <  y  <  «;  C'-  »  0 
y  =»  u;  <  C^(u) 

Z  <  y  <  u;  C^(u)  >  C^>  C(£) 
y  =  Z;  C ^  =  00 


The  algorithm  specified  below  can  be  understood  only  in  conjunc¬ 
tion  with  the  (necessary  and  sufficient)  Kuhn-Tucker  conditions  for 
optimality,  on  the  basis  of  which  it  is  easy  to  construct  the  following 
"optimality  table" . 


State  of  Actiyitv  Flow  Condition  Domain  of  C 

Durat ion 


la< 

yij 

‘  'j  -  ‘l 

f  ■  0 

F 

ta< 

yij 

=  u .  . 

ij 

=  t  .  "  t  . 
j  1 

0  <  f  .  . > -C . (u  . ) 

“  ij  ij  ij 

E 

Z. .  < 

yij 

»  c . 

J 

-  t  <  u 

i  -  a 

C  '(i..)  >  f.  ,  =  -c:.(y..) 
ij  ij  ij  LJ 

c 

‘u  " 

yij 

-  t , 

J 

-  t.  <  u.  , 
i  ij 

L 

Table  l. 

Optimality  Table 
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In  passing,  it  is  worth  remarking  that  the  applicability  of  the  Kuhn- 
Tucker  conditions  to  general  convex  (not  necessarily  quadratic)  C ^  func¬ 
tions,  as  noted  by  Elmaghraby^  ^  was  first  applied  to  quadratic  cost 
functions  by  Kapur who  achieved  the  KD  and  reasoned  the  optimality 
table  shown  above.  Unfortunately,  his  subsequent  development  may  not 
achieve  the  optimum,  as  demonstrated  in  [18].  This  is  mainly  because 
of  his  strict  adherence  to  the  concept  of  "maximal  flow  cutset",  which 
had  validity  in  the  case  of  linear  cost  functions  (see  [12]),  but  has 
no  equivalent  validity  in  the  case  of  nonlinear  functions,  albeit  it 
has  some  utility  in  parts  of  the  calculations,  as  presently  demonstrated. 

Our  algorithm  rests  on  the  following  crucial  observation:  At  itera¬ 
tion  r,  if  all  activities  in  the  CSN  were  in  State  C  then  we  would  pro¬ 
ceed  in  an  identical  fashion  to  our  procedure  of  Case  1  discussed  above. 

It  is  not  difficult  to  see  that  Case  1  is,  in  some  sense,  a  "degenerate" 
case  of  our  present  conditions,  in  which  the  KD  has  the  shape  of  Figure  5 
in  which  region  E  has  disappeared  and  region  L  is  never  reached  (because 
the  duration  is  never  realized,  according  to  Condition  L) .  But  since 
now  we  must  contend  with  regions  E  and  L,  the  basic  procedure  of  Case  1 
should  be  modified  to  reflect  the  new  concerns  that  have  resulted  from 
these  two  regions.  In  particular: 

(a)  An  activity  in  State  E  cannot  be  shortened,  though  the 
"flow"  through  it  may  be  increased  until  it  equals  C^j(u^j), 
at  which  time  the  activity  may  be  shortened. 

(b)  An  activity  in  State  L  is  at  its  lowest  possible  duration, 
and  therefore  cannot  be  shortened  at  all,  though  "flow" 
through  it  may  be  increased  indefinitely. 
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Careful  study  of  the  Optimality  Table  immediately  reveals  that  any  flew11 
increase  in  the  CSN  is  translatable  (linearly)  into  change  in  duration 
of  the  individual  activities  in  the  CSN,  Consequently,  the  limitations 
on  a  ,  the  amount  of  permissible  "compression"  in  iteration  r,  should 
be  augmented  by  the  two  most  recent  considerations,  as  follows: 

( r ) 

a^  :  to  be  derived  from  the  augmented  flow  such  that  activity 
(ij)  moves  from  region  E  to  region  C  (or  equivalently,  f 
which  was  strictly  <  C'j(u^)  is  increased  to 
(  r ) 

a^  :  to  be  derived  from  the  augmented  flow  such  that  activity 
(ij)  moves  from  region  C  to  region  L  (or  equivalently, 
which  was  strictly  >  is  decreased  to  * 

As  before. 


(r)  ,  a(r)  a(r)  _  ( r )  \ 
a  min(a^  ,  *  3^  *  3^  / 


(r)  (r^ 

in  which  a|  and  a^  J  are  determined  as  before. 

We  refrain  from  giving  the  full  details  of  the  algorithm;  they  may 
be  found  in  [11].  However,  we  make  the  following  remarks  concerning  its 
computability.  We  confine  ourselves  to  iteration  r,  and  therefore 
eliminate  the  explicit  reference  to  the  iteration  number  for  the  sake 
of  simplicity. 

A  CSN  may  have  arcs  in  states  E,  C  and  L.  The  bound  a^  on  the 
permissible  reduction  in  project  duration,  may  be  obtained  by  interpreting 


the  arcs  in  state  E  as  possessing  capacities  equal  to  [ j  ^ ui j  ^  “  ^ij ^  >  ® 
(ij)eE,  and  performing  a  standard  flow  augmentation  step.  Conceptually, 


this  brings  the  marginal  value  of  "cohort  activities"  (i.e.,  activities 
that  lie  on  a  minimal  capacity  cutset  containing  (ij))  to  the  same  level 
as  that  of  activity  (ij)cE,  and  there  would  be  no  inequity  in  the  valua¬ 
tion  of  the  various  activities. 

The  change  in  duration  of  activities  in  state  C  requires  the  solution 
of  a  system  of  K  linear  equations  in  K  unknowns,  similar  to  Case  1, 
Equations  (7).  Unfortunately,  the  matrix  B  no  longer  possesses  the 
desirable  property  of  +  1  or  0  entries:  its  top  m-1  rows  will  indeed 
have  such  entries,  but  the  bottom  (K-nrH)  rows  (corresponding  to  the 
"fundamental  loops")  will  have  fractional  entries.  This  is  due  to  the 
non-zero  slope  of  in  region  C,  which  relates  the  "flow"  variables  f^ 
to  the  amount  of  reduction  d ^ . 

Finally,  it  must  be  noted  that  while  each  path  in  the  CSN  is  shortened 

at  each  iteration,  any  individual  activity  need  not  follow  such  monotone 

behavior:  an  activity  may  be  lengthened  after  having  been  shortened, 

though  no  activity  will  possess  positive  float  if  it  had  once  been  in 

the  CSN.  This  is  not  a  new  result,  since  it  is  observed  even  in  the 

[121 

linear  cost  case 

The  procedure  has  been  programmed  in  FORTRAN  4  on  the  University’s 


IBM  370/165.  Details  and  documentation  may  be  found  in  [18]. 


336 


2.  OPTIMAL  LINEAR  APPROXIMATIONS 


It  is  a  truism  that  project  compression  under  linear  cost  functions 
of  the  form  C(y)  ■  b  -  ay;  i  <  y  <  u;  a,  b>  0,  is  considerably  easier 
to  resolve  optimally  than  under  nonlinear  cost  functions .  The  above 
analysis,  carried  under  the  simplifying  assumption  of  quadratic  cost 
functions  should  amply  demonstrate  the  fact,  if  such  demonstration  were 
needed!  The  natural  question  then  is:  what  if  C(y)  is  not  quadratic, 
though  still  convex  decreasing  as  y  increases  from  £  to  u.  Can  the 
problem  still  be  analyzed?  For  instance,  suppose 

C(y)  ■  a/(b+ky);  0<  !  <  y<  u<«;  a,b,k  >  0  .  (9) 

What  can  be  said  about  the  optimum  in  this  case? 

One  may  wish  to  persist  in  applying  the  theoretical  constructs  of 
the  exact  solutions,  which  are  indeed  applicable  in  toto.  Unfortunately, 
the  KD  will  now  possess  a  nonlinear  segment  in  region  C,  which  would 
necessitate  the  solution  of  a  system  of  nonlinear  equations  in  the 
"flows''  {f^ j }  at  each  iteration;  an  onerous  task  at  best. 

The  other  alternative  is  to  approximate  the  cost  function  C(y)  by 
a  piecewise  linear  and  convex  function  (i.e.,  linear  spline)  that  is 
optimal  in  some  sense.  There  are  two  immediate  questions  that  present 
themselves:  the  first  is  to  define  the  sense  of  the  approximation  itself, 
and  the  second  is  to  define  the  criterion  of  optimality  of  the  approxima¬ 
tion.  We  elaborate  on  these  two  questions. 
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The  approximation  we  seek  should  guarantee  a  deviation  from  the  optimum 


value  that  does  not  exceed  a  prescribed  proportional  error  6.  To  see  what 

this  implies,  let  C(Y)  ■  J  ^..(y.,,),  and  assume  the  optimum  is  achieved 

(ij)eA  13 

at  the  vector  of  durations  y*  ■  {y*^},  corresponding  value  C(Y*) .  Of 

course,  Y*  is  unknown,  and  we  wish  to  approximate  the  value  C(Y*) .  Let 

Hij(^ij)  ^enote  the  piece-wise  linear  and  convex  approximation  to  C^fy^), 

and  let  H(Y)  *  £  H  (y  )  be  the  criterion  function  of  the  linear 

(ij)€A 

program  (LP)  defined  by  the  constraints  (2)-(4).  The  solution  of  this  LP, 
which  is  achieved  relatively  easily,  shall  yield  a  vector  of  activity 
durations,  which  we  denote  by  n*  *  anc*  t*ie  corresponding  value 

H(n*) .  Now,  the  requirement  we  impose  may  be  stated  as  follows:  Select 
the  approximation  to  satisfy  the  inequality 


| C(Y*)  -  H(n*) (  1  6 |C(Y*) 


(10) 


for  any  prescribed  value  6  >  0.  Typically,  6  is  less  than  1,  chosen  from 
the  interval  {  .01,  .10].  Restriction  (10)  simply  ensures  that  the  approxi¬ 
mate  optimum  H(n*)  shall  not  deviate  from  the  true  optimum  C(Y*)  by  more 
than  a  small  fraction  of  the  value  of  the  true  optimum. 


We  now  address  the  issue  of  the  ,fmeasure  of  closeness"  of  the  individual 
approximating  (linear)  functions  H(y)  to  the  original  C(y) :  We 
adopt  the  "maximum  norm"  (Chebychev)  criterion.  In  other  words,  we  seek 
a  piece-wise  linear  and  convex  approximating  spline  whose  maximal  deviation 
from  the  original  function  C^Cy^)  is  minimal  (i.e.,  optimal  in  the  sense 
of  Chebychev) . 
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Our  procedure  comprises  Cvo  basic  steps:  The  first  accepts  the  data 
of  the  original  problem  and  the  specified  6  >  0  and  yields  a  value  e  >  0 
that  represents  the  bound  on  the  maximal  deviation  between  C . .  and  H . 


U 


ij* 


The  second  accepts  e  and  constructs  for  each  function  C^(y)  the  approxi¬ 


mating  linear  spline  H^(y)  that  deviates  from  C^(y)  by  at  most  e  through¬ 


out  the  range  of  y. 


The  details  of  the  construction  are  given  in  [  9  ]  .  The  procedure 


is  programmed  for  the  function  C^(y)  ■  a^j/(b^+k^y)  for  illustrative 


purposes . 


II.  ESTIMATIONS  IN  THE  PERT  MODEL 


One  of  the  main  advantages  of  using  network  analysis  for  project 
planning  and  control  is  the  ability  to  identify  the  activities  that  are 
critical  to  the  achievement  of  the  project  objectives.  In  DANs,  it  is 
relatively  easy  to  respond  to  questions  such  as:  What  is  the  critical 
path(s)?  What  are  the  most  M  cricital  activities?  and  so  forth. 

We  seek  to  develop  the  analogous  results  in  PANs,  such  as  the  PERT 
model.  Clearly,  one  must  phrase  the  questions  in  probabilistic  terms 
such  as:  What  path  (or  paths)  is  the  most  probable  to  be  critical? 

Which  activity  (or  activities)  has  the  highest  probability  of  being 
critical?  What  is  the  probability  that  a  particular  path  is  critical? 

Which  (minimal)  paths  have  a  total  probability  of  being  critical  at 
least  0,  etc.? 

In  the  following  sections  we  formalize  these  intuitive  notions 
and  develop  procedures  for  approximating  their  measures.  We  concentrate 
in  this  brief  report  on  delineating  the  fundamental  concepts  adopted  in 
the  approximations  used,  leaving  the  detailed  accounts  to  a  companion 
report  in  preparation. 

Some  Definitions  and  Basic  Concepts 

Let  P  be  the  set  of  paths  in  the  AN  and  let  p,  denote  the  hth  path 

n  - 

(from  node  _1  to  node  n_) ,  and  let  Z(p^)  denote  its  duration,  h  *  1,2,  ...,  p. 
The  criticality  index  (Cl)  of  a  path  p^  is  denoted  by  CR(p^) ,  and  defined  by 


where  Pr(-)  means  the  "probability  of  ,  and  the  duration 


Z(ph) 


l 

(ij)eph 


ij 


(1 


Here,  represents  the  duration  of  activity  (ij),  a  random  variable 

(r.v.)  presumed  of  known  probability  distribution  function  (pdf). 

CR(Ph)  has  been  estimated  by  several  analytical  and  Monte  Carlo 
sampling  techniques,  the  latter  ranging  from  the  "crude"  to  the  very 
"sophisticated"  ([1,19,20]).  Our  objective  is  to  present  analyti¬ 
cal  approaches  to  the  determination  of  CR(p^)  and  the  other  measures 
stated  above. 

The  criticality  of  an  activity  (ij),  1  £  i  <  j  ^  n,  is  defined  as 
the  sum  of  the  CIs  of  the  paths  containing  it.  We  denote  the  Cl  of 
activity  (ij)  by  the  symbol  CA(ij);  hence. 


CA(ij)  -  l  CR(ph)  ;  (ij)eph  (1 

Ph 

The  procedure  described  in  the  following  section  relies  heavily  on 
the  iterative  algorithm  of  Dodin  [  4  ]  for  the  estimation  of  the  pdf  of 
the  project  completion  time,  which  is  summarized  next  because  of  its 
relevance  to  computing. 

1.  THE  APPROXIMATION  OF  THE  PDF  OF  PROJECT  COMPLETION  TIME 

Dodin  [  4 ]  developed  a  system  of  computer  programs  that  accomplish 


four  tasks: 


1.  Generate  random  AN  of  a  pre-specified  number  of  nodes  N  and  arcs  A; 

1. e.,  a  network  from  the  set  of  networks  with  this  n  and  | A [ ,  in  which 

all  are  equally  probable.  The  procedure  represents  a  slight  improvement 

[131 

over  that  of  Herroelen  and  Caestecki  .  Basically,  there  are  two 
approaches,  which  are  best  discussed  relative  to  the  N  x  N  adjacency 
matrix.  The  first  is  the  "deletion  method",  which  starts  with  the  upper 
triangle  above  the  main  diagonal  full,  then  sequentially  eliminates 
entries  of  1  randomly,  subject  to  the  restriction  that  every  node  is 
connected  to  both  the  origin  as  well  as  the  terminal  nodes  until  the 
desired  number  is  left.  The  second  is  the  "addition  method",  which  re¬ 
verses  the  view  and  starts  with  an  empty  upper  triangle  that  is  to  be 
filled  sequentially  until  the  desired  number  of  arcs  are  present,  subject 
to  the  same  constraint. 

The  choice  of  the  method  to  implement  depends,  obviously,  on  the 
density  of  the  network.  The  deletion  method  is  preferred  if  | A |  > 
n(n-l)/4  +  1. 

2.  Discritize  any  given  continuous  pdf.  Three  approaches  have  been  tried; 

and  the  most  efficient  in  terms  of  accuracy  and  computer  time  is  a  hybrid 
of  the  last  two.  Let  m  denote  the  number  of  discrete  points  that  represent 
F(0*  (i)  the  first  method  assumes  that  both  the  location  x^  and  the 

probability  mass  p  (x^)  =  of  occurrence  of  x^.  are  unknown,  k  -  1,  ...,  m. 
It  is  desired  to  determine  these  2m  unknowns  by  equating  the  first  2m 
moments  of  the  discrete  approximation  to  the  (given)  theoretical  df: 

m 

l  \  Pk  =  V r  ,  for  r  =  0,1,  ... ,  2m-l  (14) 

Ic  1 
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where 


CO 

IT  I  IT 

Pr  =  E(x  )  =  y  dF(y)  ;  the  rth  moment  . 
0 

Equations  (  14  )  may  be  represented  in  matrix  form  as 


VP  =  E 


where  V  is  the  well-known  Vandermonde  matrix  of  dimensions  2m  *  m, 

P  is  the  probability  vector  with  m  components,  and  E  is  the  vector 
of  2m  moments  {yr}.  Two  methods  were  tried  to  solve  this  system  of 
nonlinear  equations,  but,  unfortunately,  neither  succeeded  for  m  >  8. 
This  approach  was  then  abandoned*  (ii)  The  second  method  may  be 
termed  Mthe  equal  Interval  method",  in  which  the  (finite)  range  of 
the  arc  duration  is  divided  into  ra  equal  intervals  of  width  A  each. 
The  finiteness  of  the  range  is  secured  by  defining  two  points  l  and  u 
by 


Pr (X  <  %)  =  0.0005  =  P(x  >  u)  . 

With  the  intervals  defined,  it  is  easy  to  determine  the  probabilities 
{p^},  assumed  to  be  associated  with  {x^.}  at  t*ie  min”P°ints  their 
respective  intervals.  This  method  proved  quite  satisfactory  for  dffs 
that  possess  no  sharp  peaks  or  severe  skewness.  The  method  is  also 
convenient  for  the  use  of  the  Fast  Fourier  Transform  (FFT)  method  in 
the  successive  approximation  discussed  below. 
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(iii)  The  third  method  divides  the  (finite)  range  into  intervals  of 
unequal  length  but  of  equal  probabilities  (*  1/m).  It  is  a  simple 
matter  to  determine  the  intervals  through  the  inverse  function  F  *“(•)» 
and  proceed  assuming  the  probability  concentrated  as  the  mid-point  of 
each  interval. 

A  hybrid  of  approaches  (ii)  and  (iii)  would  use  equal  intervals 
where  the  df  is  flat,  or  nearly  so,  and  reverts  to  equal  probabilities 
where  it  peaks. 

3.  Reduce  the  AN  to  its  irreducible  form,  through  the  operation  of 
addition  (i  convolution  operation)  of  arcs  in  series  and  multiplication 
(=  maximum  operation)  of  arcs  in  parallel.  We  wish  to  make  two  remarks 
on  this  step.  The  first  is  that  the  number  of  discrete  points  m  is 
usually  held  fixed  beyond  a  certain  point,  and  a  reduction  operation  that 
yields  a  number  of  points  larger  than  m  must  be  "folded  back"  to  only 

m  points.  This  introduces  the  first  source  of  error  in  the  approximating 
procedure.  The  second  remark  is  that  it  is  in  this  step  that  the  equal 
interval  method  helps  because  of  our  ability  to  use  the  FFT  in  the  reduc¬ 
tion  process. 

4.  Approximate  the  irreducible  network.  This  is  the  "heart"  (central 
element)  of  the  computer  package  and  embodies  the  concept  of  independence 
that  is  usually  invoked  relative  to  paths  into  any  node  ieN. 

The  iterative  progression  over  the  nodes  of  the  network  employs  the 
convolution  of  F^(t)  with  F^(t),  where  F^(t)  denotes  the  pdf  of  the 

time  of  realization  of  node  ieB^,  and  F^j (t)  is  the  pdf  of  ,  the 
duration  of  activity  (ij).  The  convolution  operation  can  be  performed 
by  either  the  usual  formula  or  through  the  use  of  FFT.  Both  approaches 
were  tested  and  preference  is  given  to  the  usual  formula. 


The  accuracy  of  the  approximation  was  tested  relative  to  two 


measures : 

(i)  The  max-absolute  deviation  max  |  Fn(0  “  F^(t)  ^ 

t  .  n 

(ii)  The  average  value  of  the  absolute  deviation 
00 

|Fn(t)  -  Fn(t) |dt  . 

0 

In  these  expressions,  the  "exact"  pdf  Fn(t)  was  secured  through  MCS.^ 

The  results  are  most  encouraging:  (a)  For  networks  up  to  60  nodes 
and  150  activities,  the  max.  deviation  was  less  than  .08  and  the  average 
deviation  less  than  .03.  (b)  The  max.  value  of  the  absolute  deviation 

occurs  wihtin  the  low  30%  of  the  range  of  the  r.v.  This  is  reassuring, 
since  in  the  study  of  ANs  the  realizations  of  greatest  interest  are 
those  in  the  "right  tail"  of  the  distribution;  i.e.,  in  the  high  30% 
of  the  range.  (c)  The  sampled  distribution  F^(t)  (via  MCS)  converges 
toward  the  approximate  distribution  F^(t)  as  t*ie  sample  size  increases! 
This  is  an  unexpected  result,  since  it  implies  that  the  sampled  df  is 
the  "inexact”  one!  (d)  The  processing  time  of  the  approximate  procedure 
are  quite  reasonable,  being  of  the  order  of  30  secs,  for  networks  of 
size  (N,A)  (60,120)  on  the  AMDAHL  V-7. 

2.  THE  ACTIVITY  CRITICALITY  INDEX 

Let  n^  denote  the  in-degree  of  node  and  assume  the  arcs  into 
±  to  be  numbered  in  increasing  order,  the  same  as  their  originating  nodes; 
see  Figure  6. 

+Monte  Carlo  Sampling. 
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Figure  6.  The  scheme  of  numbering  arcs  into 


We  define  the  cutset  at  node  to  be  the  set  of  arcs  resulting  from 

the  partitioning  of  the  set  of  nodes  N  into  two  mutually  exclusive  sub¬ 
sets  S.  and  T.  where 
J  J 


Sj  =  {ieN  :  i  <  j}  and  Tj  =  {ien  :  i  >  j> 


Hence 


C,  =  {(hk)eA  :  heS,  and  keT.} 
j  j  J 


(15) 


Let  Tj  denote  the  duration  of  the  longest  path  forward  from  node  1_  to 

node  i^,  and  Tj  denote  the  duration  of  the  longest  path  backwards  from 

node  n  to  node  j.  Clearly,  both  and  are  r.v.'s.  Let  W.  .  denote 
-  j  j  ij 

the  duration  of  the  longest  path  containing  arc  (ij);  then 


T  +  Y 
i  ij 


+  t: 


(16) 
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(1 


where  -  max  {Z(p^i>  }  and  *  max  { Z (PNj ) } 


li 


Nj 


and  p^j  denotes  a  path  from  to  j_. 


Proposition  5  W 


ij 


max  { Z (Pj^)  }  ;  Z(p^)  as  defined 

Ph 

<ij)eph 


in  (12)  . 

The  proof  is  by  direct  substitution  in  the  definition  of  in  (16) . 

Now,  the  exact  value  of  the  Cl  of  arc  (ij)  is  given  by  (see  (13) 
and  (11) ) : 


CA(ij)  -  l  CR(p.) 

<iJ>e"h 


-  I  Pr(Z(p,  )  >  Z(p  )  for  all  p  eP)  (l1 

(ij)eph  q  q 

Let  L(ij)  denote  the  subset  of  paths  that  contain  arc  (ij)  .  Then, 
clearly,  CA(ij)  measures  the  "weight"  attached  to  the  event  that  any 
PheL(ij)  is  longer  than  any  other  path  in  the  network.  Unfortunately, 
it  is  extremely  difficult  to  calculate  CA(ij)  directly  from  (18) 
because  of  the  need  to  identify  the  set  of  all  paths  in  the  network  and 
to  calculate  the  corresponding  CR's.  To  obviate  that  need,  we  appeal 
to  the  concept  of  "cutset"  defined  above.  Let 

CAP(ij)  *  Pr(W  >  W  for  all  (ki)eC.)  (li 


where  is  as  defined  in  (15),  Close  scrutiny  reveals  that  CAP(ij) 

measures  the  probability  that  the  maximum  of  the  paths  in  the  subset 

L(ij)  is  longer  than  the  maximum  of  the  paths  not  containing  (ij),  (i.e., 

in  the  complementary  set  L(ij)  =  P  -  L(ij)).  These  latter  paths  are 

precisely  the  paths  that  contain  all  the  arcs  in  C_.  except  arc  (ij)  , 

It  is  demonstrated  in  [  6 ]  that  CAP(ij)  always  underestimates  CA(ij) , 

But  for  the  moment,  we  ascertain  the  iterative  manner  in  which  C,  is 

3 

obtained  from 

Proposition  6  Cj  *  C^+1  +  {(ij)eA  :  i  <  j }  -  {(jk)eA  :  k  >  j} 
with  the  initial  condition 


CN  =  {(iN)eA) 


where  A  is  the  set  of  activities  in  the  network. 


The  proof  is  achieved  by  induction,  starting  with  node  N-l. 


Proposition  7  CAP(ij)  £  CA(ij)  for  all  (ij)eA 

We  argue  heuristically  as  follows.  If  any  path  p^£L(ij)  is  longer  than 
any  path  p^eL(ij) ,  then  h  fortiori,  max{Z(p^)  :  p^cL(ij) }  is  longer 
than  max{Z(p^)  :  p^eL(ij)}.  Therefore,  the  set  on  which  CA(ij)  is 
defined  contains  the  set  on  which  CAP(ij)  is  defined.  Moreover,  the 
sum  of  probabilities  defining  CA(ij)  is  no  less  than  the  probability  of 
the  union  of  the  events  in  the  set,  which  is  no  smaller  a  set  than  that 
defining  the  probability  of  CAP(ij)  .  Consequently,  CAP(ij)  <_  CA(ij)  . 


Proposition  8  For  any  node  i  *  1,  n, 

l  CA(ji)  -  l  CA(ij)  (21 

jeB(i)  jeA(i) 

where  the  sets  A(i)  and  B(i)  denote  the  sets  of  nodes 
connected  to  and  occurring  after  it  and  before  it, 
respectively . 

The  proof  is  accomplished  by  defining  the  Cl  of  node  i_,  CN(i)  as 
follows : 


CN(i)  -  l  CR(ph)  . 

Ph 

i£Ph 

That  is,  the  Cl  of  a  node  is  the  sum  of  the  CIs  of  the  paths  containing 
that  node.  Then  it  is  easy  to  show,  by  appealing  to  definitions,  that 
CN(i)  is  equal  to  each  side  of  equality  (20) . 

Two  immediate  consequences  of  Proposition  8  follows: 

Corollary  1  CN(1)  -  CN(n)  -  £  CR(p  )  >_  1.0  (2] 

Ph£P 

The  two  equalities  in  (21)  are  rather  obvious;  and  the  last  inequality 
follows  from  the  definition  of  CR(p^)  as  probability,  and  the  fact  that 
the  paths  in  the  network  are  not  necessarily  independent.  (Equality  to 
1.0  is  achieved  only  when  the  paths  are  independent  and,  in  the  case  of 
discrete  pdf's,  no  two  paths  are  critical  simultaneously.) 

Corollary  2  The  Cl  of  any  path  is  equal  to  the  Cl  of  any  unique 
arc  on  the  path  (i.e.,  an  arc  that  belongs  to  no  other  path).  Moreover 


all  unique  arcs  on  the  same  path  have  the  same  Cl. 


We  now  concern  ourselves  with  the  computation  of  CAP(ij).  Let 
V(ij)  ^enote  the  maximum  duration  of  paths  not  containing  arc  (ij). 

V(ij)  =  max  > 

(hk)eCj 

Chk)cLCij) 

Consequently,  from  (19) ,  we  may  write 
CAP (ij )  *  Pr[W..  >  V(ij)] 

in  which  each  W  is  given  by  (16)  ,  (rs)  C  .  Thus  the  problem  of  deter- 
rS  J 

mining  the  value  of  CAP(ij)  reduces  to  calculating  the  pdf  of  the  r.v.'s 
£  b 

and  Ts,  (rs)eCj,  and  performing  the  necessary  convolution  and  maximum 
operations.  But  the  calculation  of  the  (approximate)  pdf’s  of  and 
is  precisely  the  problem  discussed  in  [  4  ] .  As  is  mentioned  there, 
the  causes  for  the  errors  in  estimation  of  these  pdf's  are  three: 

(i)  the  discretizing  of  continuous  distributions  (if  any);  (ii)  the 
assumption  of  independence  of  paths;  and  (iii)  the  reduction  of  the 
domain  of  the  computed  pdf's  to  a  predetermined  (small)  number  of 
discrete  points.  Most  importantly,  in  [  4  ]  it  is  demonstrated  that  the 
approximation  to  the  pdf  of  W  may  either  overestimate  or  underestimate 
the  true  pdf.  Consequently,  the  approximation  to  CAP(ij),  denoted  by 
ACAP(ij),  cannot  be  asserted  to  be  an  underestimate  of  CA(ij),  though 
empirical  evidence  in  [  4  ]  indicates  that  the  approximation  is  excellent. 

Finally,  it  was  demonstrated  in  Cor.  1  that  CN(1)  ^  1.0.  In 
case  CN(N)  >  1,  which  is  almost  always  true,  it  is  advantageous  to 
normalize  all  CIs,  for  arcs  and  nodes,  by  dividing  throughout  by  CN(N), 
because  such  normalization  reduces  the  maximum  error  in  the  estimation 


of  activity  CIs.  Indeed,  the  original  maximum  error  * 

max  (|CA(ij)  -  CAP(ij)|),  while  the  normalized  maximum  error  is 
(ij)eA 

precisely  1/CN(N)  of  its  value.  A  minor  benefit  of  such  normalization 
is  that  the  various  CIs  may  be  thought  of  now  as  probabilities,  which 
was  not  possible  before. 


3.  THE  HIGHEST  K-CRITICAL  PATHS 


The  title  implies  either  of  the  following  two  problems: 

(i)  the  identification  of  the  minimum  set  of  paths  whose  probability 
of  being  "critical",  i.e.,  that  any  member  of  the  set  is  of  no  shorter 
duration  than  any  path  in  the  network  that  is  not  in  the  set,  is  at 
least  S;  0  <  6  <  1,  (typically  6  >_  0.50);  (ii)  the  identification 
of  the  K  "most  critical"  paths  in  their  rank  OTdeT,  i.e.,  the  path(s) 
with  the  highest  probability  of  being  critical,  the  path(s)  with  the 
next  highest  probability  of  being  critical,  and  so  forth  to  the  K£h 
ranking  path(s). 

The  theoretical  discussion  presented  below  leads  to  an  approxi¬ 
mating  procedure  that  emulates  that  used  in  DANs  to  identify  the  first 
K  CPs. 


Let  P(j)£p  denote  the  set  of  paths  ending  in  node  PjE^Cj) 


denote  the  kth  ranked  "CP"  ending  in  node  j_,  and  Z(pj)  its  duration. 

k  r  lc  r 

We  say  that  p^  dominates  p^  in  probability,  denoted  by  p^  p^  if 

Pr[Z(PJ)  >  Z(p*)]  >  Pr[Z(p')  >  Z(p*)]  (22) 


The  following  assertion  is  an  immediate  consequence  of  the  above 
definition. 
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Proposition  9  Let  and  be  two  paths  ending  in  node  i_ 

such  that  p*  >  p?.  Then  pj  =  [p*o(ij)]  >  [p?°(ij)l  ■  Pj» 
where  "o"  denotes  the  catenation  (or  extension)  of  the 
path  from  the  right. 

In  words,  this  Proposition  asserts  that  adding  activity  (ij)  to  both 
paths  does  not  alter  their  relative  ranking,  which  is  eminently 
plausible. 

Unfortunately,  dominance  in  probability  as  defined  in  (22)  is, 
in  general,  intransitive,  contrary  to  prima  facie  expectations. 

1  2  2  3  1  3 

That  is,  if  pi  >  and  >  p^,  then  the  relation  pi  ^  p^  need  not 

be  true  in  general,  but  is  true  for  symmetric  distributions  *  To  see 

1  2 

this,  note  that  the  dominance  relation  p^  ^  p^  implies  that 
Pr[Z  (pj)  >  Z(p*)]  >  1/2.  We  thus  have 

Pr[Z  (pj)  -  Z  (p*)  -  (wx  -  P2)  >  -  (u1  -  v2)]  >  1/2  (23) 

where  pf  *  E[Z  (pT)].  Now,  assuming  symmetric  pdf's,  it  is  clear 

1  2 

that  the  r.v.  [Z  (p^)  -  Z  (p^)  ]  is  also  symmetric  about  its  mean 

(pj  -  Pj) *  and  inequality  (23)  implies  that  -(p^  -  P2)  — 

2  3 

yl  —  w2*  Similarly,  the  dominance  relation  Pj^  >•  p^^  implies  that 
P2  2.  ^3  under  the  assumption  of  symmetric  pdf's.  We  therefore  con¬ 
clude  that  P^  ^  p^«  N<w  reversing  the  argument  we  conclude  that 
Pr[Z  (p^)  _>  Z  (p^)  _>  1/2,  which  finally  implies  that  We 

have  just  proved 
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Proposition  10  If  the  paths  to  a  node  possess  symmetric  pdf's, 

then  dominance  in  probability  as  defined  in  (23)  is 

1  2 

transitive.  Furthermore,  p^  >  pT  Vj  >  Uj*  where 

Wr  -  E[Z(p*)]. 

The  utility  of  these  conclusions  is  evident  for  the  result 

sought.  High-numbered  nodes  are  the  ones  most  prone  to  having  a  large 

number  of  paths  (from  node  1_)  leading  into  them.  (One  can  easily 

verify  that  node  £  has  1  path,  node  £  has  <  2  paths,  node  £  has  £  4 

paths,  node  £  has  £  8  paths  and  node  £  has  £  16  paths.  In  general, 
i-2 

node  £  has  £  2  paths  leading  into  it  from  node  £. )  But  these  paths 
are  precisely  the  ones  whose  pdf's  may  be  approximated  by  symmetric 
distributions  (usually  the  normal  pdf) .  Consequently,  they  are  the 
ones  to  which  the  assumption  of  transitivity  of  dominance  is  appropri¬ 
ate.  Lower -numbered  nodes  are  not  in  need  for  such  approximation  since 
their  paths  may  be  explicitly  enumerated  and  ranked. 

The  algorithm  alluded  to  at  the  beginning  of  this  section  is 
now  apparent,  and  its  gist  is  as  follows.  For  any  node  £,  let  its 

immediately  preceding  nodes  be  i.,^.  •••»  and  suppose  that  the 

12  k 

first  K  CPs  to  node  i  have  been  identified  as  p^  V  p  >  ...^p.  .  To 

r  r  r 

determine  pj,  the  most  critical  path  to  node  £,  we  need  to  compare 

only  the  s  topmost  r.v.'s{Z[p*  o(ifj);  r  =  1,2,  ....  s]}and  rank  them. 

r  1 

The  highest  ranking  path  (in  probability)  is  p  ^ .  Now,  the  second 
ranking  path  in  this  set  is  compared  with  the  second  ranking  path  in 
the  set  of  paths  to  which  the  topmost  path  belonged;  and  the  higher 
ranking  (in  probability)  between  these  two  paths  is  p^ ;  and  so  on  for 
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Pj ,  k  *  3,4,  . ..,  k.  At  termination,  the  answer  to  the  two  problems  posed 
at  the  beginning  is  in  hand: 

12  K 

(i)  The  K  most  CPs  of  the  network  are  precisely  pA,  p  ,  ....  p  . 

n  n  n 

(ii)  The  minimum  set  of  paths  whose  probability  is  no  less  than  6 
is  easily  obtained  from  the  most  critical  paths  into  node  n. 

Computing  experience  with  this  approximating  procedure,  and  compari¬ 
son  with  the  results  obtained  from  extensive  MCS  in  determining  the  most 
critical  three  paths  in  networks  of  varying  sizes  reveal  three  signifi¬ 
cant  facts:  (i)  the  match  between  the  approximating  procedure  and  MCS 
decreases  with  path  rank:  there  was  93%  matches  in  path  rank  1,  71% 
matches  in  rank  2,  and  only  57%  matches  in  rank  3;  (ii)  the  approximate 
procedure  consumes  significantly  less  time  than  MCS  (approximately 
an  order  of  magnitude  less);  (iii)  In  several  Instances,  the  set  of 
paths  identified  by  both  the  approximate  procedure  and  MCS  were  identical, 
but  the  rank  of  the  paths  within  the  two  sets  was  different.  This  is 
encouraging  since  it  implies  that  both  approaches  would  identify  the  same 
set  of  arcs  as  critical;  (iv)  Experimentation  with  MCS  was  necessarily 
aborted  at  small-size  networks  (n  *  30  and  | A [  *  90),  since  larger  networks 
would  have  required  inordinately  large  amounts  of  time.  This  phenomenon 
was  not  experienced  by  the  approximate  procedure  because  of  the  difference 
in  complexity  between  the  two:  MCS  requires  the  enumeration,  identifica¬ 
tion,  and  comparison  of  paths,  which  is  a  process  that  is  of  exponential 

2 

complexity,  while  the  approximate  procedure  is  of  complexity  0(n  ) . 


III.  REDUCIBILITY  OF  ANs 


The  third  area  of  investigation  is  related  to  the  problem  of  "reduc¬ 
tion"  of  ANs,  which  rears  its  head  in  more  than  one  investigation  in  the 

context  of  ANs.  (For  a  description  of  three  such  investigations,  see 

r  3  1 

Colby  and  Elmaghraby  .)  Here  we  limit  our  attention  to  the  PERT 
model  and  ask  the  question:  What  is  the  d.f.  of  the  time  of  realization 
of  the  "terminal  node"  of  the  network  (which  signifies  the  completion 
time  of  the  project)?  Now,  it  is  well-known  that  two  activities  in 
series  may  be  collapsed  into  one  activity  whose  d.f.  is  given  by  the 
convolution  of  the  two  individual  d.f.'s.  On  the  other  hand,  two 
activities  in  parallel  may  be  collapsed  into  a  single  activity  whose 
d.f.  is  given  by  the  product  of  the  two  individual  d.f.'s.  If  the 
original  network  can  be  collapsed  into  a  single  activity  (l,n)  then, 
indeed,  the  analytical  form  of  the  d.f.  of  the  duration  of  the  project 
is  in  hand.  Unfortunately,  the  irreducibility  of  such  PERT  networks 
prohibits  such  (conceptually  easy)  analytical  determination,  which,  in 
turn,  gave  rise  to  various  approximating  or  bounding  procedures  discussed 
in  II  above. 

Consequently,  we  say  that  a  digraph  is  reducible  if  either  of  the 
following  two  conditions  is  satisfied: 

(a)  There  exists  at  least  one  path  with  node(s)  of  in-degree 
one  and  out-degree  one  (i.e.,  the  path  contains  two  arcs 
in  series) 

(b)  There  exist  at  least  two  paths  "in  parallel";  i.e.,  there 

are  two  distinct  nodes  i^  and  j_,  1  <  i  <  j  <  ,  and  two  dis¬ 

tinct  paths  from  i.  to  with  the  property  that  if  there  is  an 


intermediate  node  on  either  path  between  i^  and  j[,  then  it 
is  of  in-degree  and  out-degree  one. 

The  "reduction  process”  amounts  to  the  collapsing  of  two  arcs 
into  one,  starting  with  arcs  in  series  (the  process  may  alternate 
between  combining  arcs  in  series,  then  arcs  in  parallel,  then  arcs 
in  series  that  have  been  created  by  the  arcs  in  parallel;  etc.).  A 
digraph  is  said  to  be  completely  reducible  if  it  is  collapsible  to  a 
single  arc  joining  nodes  1_  and  n.  Otherwise,  we  terminate  with  a 
graph  that  is  irreducible  (which  is  shorthand  for  "not  completely 
reducible").  Then,  evidently,  both  conditions  (a)  and  (b)  are  not 
satisfied . 

The  problem  of  irreducibility  of  ANs  has  been  recognized  by  every 
researcher  in  the  field  since  the  classical  paper  of  Malcolm  et  al^^^ 
on  the  PERT  model;  (for  citations,  see  Elmaghrabyfs  book  [  8  ],  Ch.  4). 
Consequently,  it  is  natural  to  inquire  into  the  conditions  under  which 
a  day  is  irreducible.  To  this  end  we  introduce  some  definitions  and 
notation. 


Figure  7.  The  interdictive  graph  (IG). 


The  "Interdictive  Graph"  (IG)  is  the  graph  shown  in  Figure  7. 

Evidently ,  it  is  irreducible 

We  write  IN(i)  and  OUTCi)  as  shorthand  for  the  in-degree  and 
the  out-dgreee  of  node  i.,  respectively. 

We  write  NS  (a)  and  NE(a.)  for  the  "start  node"  and  "end  node" 
of  arc  aeA,  respectively. 

By  a  descend ent  of  node  i  we  mean  a  node  j  >  ^i  which  is 
connected  from  i  by  an  arc  or  a  path. 

The  set  of  all  nodes  that  connect  tjo  node  ^  by  a  path  is 
denoted  by  P(j)  ;  i«e.,  PQ)  =  {i  e  N  :  jL  <  and  i_  connects 
to  by  a  path} . 

Properties  o f  Irreducible  Digraphs  ( IDG) 

The  following  properties  of  IBGfs  are  easy  to  verify.  For  the  sake 
of  brevity  we  shall  not  clutter  this  note  with  their  proofs.  They  are 
numbered  consecutively  from  the  previous  two  properties: 

(3)  The  number  of  nodes  |n|  >_  4. 

(4)  Either  0UT(1)  =  lf  in  which  case  IN 02)  =  OUTU)  =  1  and  0UT(2)  _>  2; 

or  OUT(l)  >_  2.  therefore,  without  loss  of  generality,  we  can  take 
the  IDG  to  start  at  node  _1  whose  0UT(1)  2, 

(5)  Either  IN(n)  «  1,  in  which  case  OUT (n  -1)  *  IN(n)  -  1  and  XN(n-I)  _>  2; 
or  IN(n)  >  2.  Similarly,  we  can  assume  that  IN(n_)  >_  2. 

(6)  There  exists  a  smalles t-numbered  node  ^  !,»  A  suc^  that  2_,  G  p(i  ) 

(2)  (3) 

and  the  paths  and  fl^  -  (Ijj,...,^)  are  independent 

(i.e.*,  they  have  no  intermediate  node  in  common).  (Note  that  the  existence 
of  such  a  node  is  guaranteed  by  the  fact  that  both  2_  and  ^  are  €  P(n); 
see  Property  2*) 

(7)  For  all  i.  f  1_#  n_»  IN(i)  4*  OUT(i)  _>  3,  hence  there  are  no  arcs  in 
(simple)  series. 
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The  main  results  of  Elmaghraby  and  Dodin  [10]  is  the  following: 
Proposition:  A  digraph  is  irreducible  iff  it  contains 

the  IG. 

The  proof  of  this  assertion  is  elementary  and  rests  on  the  properties 
of  the  IG  enumerated  above • 

Two  questions  flow  naturally  from  this  result.  The  first  is: 

How  to  detect  (efficiently)  the  presence  of  the  IG,  and  if  there  is 

more  than  one,  determine  their  count  and  their  identity?  And  the 

second  is:  What  is  the  most  economical  way  (in  the  sense  of  minimum 

arcs)  to  "fix”  in  order  to  render  the  network  completely  reducible. 

The  issue  of  detection  is  easily  answered  by:  do  the  reduction 

(which  is  easily  defined  in  polynomial  time  (0(m)),  and  if  the  trivial 

network  (of  only  one  1  *  n  arc)  is  ncr  achieved,  then  the  IG  must  exist. 

Henceforth,  we  refer  only  to  the  remaining  two  questions. 

It  Is  our  contention  that  either  of  these  two  questions  poses  a 

r  2  1 

problem  that  is  NP-Hard .  Colby  demonstrated  that  both  problems  are 
in  the  class  NP.  He  also  proposed  a  heuristic  procedure  that  is  of 

4 

polynomial  complexity  (0(n  ))  that  dominates  an  earlier  procedure  by 
Dodin ^  ^  see  the  paper  by  Colby  and  Elmaghraby^  ^  ^  for  details  and 
examples . 

One  final  remark.  Despite  the  fact  that  interest  in  the  minimal 
number  of  arcs  to  "fix"  springs  from  the  desire  to  secure  the  exact  (or 
approximate)  pdf  of  the  time  of  realization  of  node  j ,  it  is  easy  to 
show  (as  was  demonstrated  by  Dodin^^  )  that  fixing  the  minimum  number  of 
arcs  is  also  useful  in  bounding  the  pdf  from  below.  That  is,  if  one  is 
not  interested  in  determining  the  exact  pdf  (through  multiple  integration 
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In  the  transient  case,  the  finite  set  of  differential  equations, 
and  in  the  steady-state  case,  the  finite  set  of  difference  equations, 
are  solved  by  numerical  techniques.  The  adequacy  of  these  techniques 
for  yielding  solutions  to  practical  systems  is  also  discussed. 
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MARKOV  MODELS  OF  MULT I -ECHELON,  REPAIRABLE-ITEM 
INVENTORY  SYSTEMS 

by 

Donald  Gross 
Douglas  R.  Miller 

1.  Introduction 

Consider  a  typical  multi-echelon  repairable-item  inventory  system 
as  shown  schematically  in  Figure  1.  Shown  there  is  a  two  location 
(bases) ,  two  level  of  supply  (spares  at  bases  and  depot) ,  two  level  of 
repair  (base  and  depot)  system  which  we  shall  denote  as  a  (2,2,2)  sys¬ 
tem.  The  nodes  BUi  (i  =  1,2)  represent  operating  and  spare  units 
(we  consider  for  now  only  a  single  item  such  as  a  final  assembly  or  a 
key  component)  at  base  i  ,  BRi  (i  =  1,2)  represent  the  repair  fa¬ 
cility  at  base  i  ,  DU  represents  depot  spares,  and  DR  the  depot 
repair  facility. 

Our  goal  is  to  develop  exact  mathematical  models  for  such  finite 
calling  population  (finite  number  of  items),  finite  repair  capacity, 
repairable  item  provisioning  systems  in  both  time-varying  and  steady- 
state  environments.  Specifically,  we  wish  to  find  the  state  probability 
vector  (the  probability  distribution  for  the  system  being  in  its  various 
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Figure  1.  Multi-echelon,  repair¬ 
able  item  system. 


possible  states)  which  will  allow  us  to  then  calculate  measures  of  per¬ 
formance  such  as  availability  (the  probability  that  at  least  some  desir¬ 
able,  prespecified  number  of  components  is  operational).  Ultimately, 
these  models  will  be  used  to  yield  the  optimal  combination  of  spares  and 
repair  channels  at  each  location  in  the  system. 

Assuming  times  to  component  failure  and  component  repair  times 
to  be  exponentially  distributed  random  variables,  we  have  a  continuous 
time  Markov  process  (CTMP) .  The  process  is  driven  by  a  rate  matrix 
Q  =  (q ± ,  where  q± ^  is  the  "rate"  of  going  from  state  i  to  state 
j  ;  that  is,  letting  X(t)  represent  the  system  state  at  time  t  , 


=  lim 
At-K) 


) 


Pr{X(t+At)  =  j|X(t)  =  i}' 


At 


i  i 


(iJj) 


For  example,  suppose  the  (2,2,2)  system  pictured  in  Figure  1 
is  in  a  state  (call  it  i  )  for  which  the  depot  spares  pool  is  not 
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empty  (at  least  one  spare  is  on  hand  at  the  depot) .  Suppose  we  consider 

the  event:  a  component  fails  at  base  1.  Describing  this  state  i  by 

the  vector  (nBul,  nBR1 ,  n^,  nRR2 ,  nDR)  ,  where  nR  denotes  the 

number  of  components  at  node  k  in  the  "network,"  this  event  takes  the 

system  to  a  state  j  ,  namely,  (nm,  n^,  nBu2,  nRR2,  n^-1,  n^+1)  , 

at  the  rate  q..  =  Xot,  n_,TT1  ,  where  1/X  is  the  mean  time  to  failure 
ij  1  BUI 

(MTTF)  of  a  component  and  is  the  probability  (or  percentage)  of 

failed  items  requiring  depot  repair. 

If  we  denote  the  state  probability  (row)  vector  at  time  t  by 
tt  ( t )  =  (jr^(t),  tt^  ( t )  ,  .  ..,  7Tg(t)3  »  that  is,  the  ith  element,  7K(t)  * 
is  the  probability  of  the  system  being  in  state  i  at  time  t  (there 
is  a  finite  number  of  states  [call  this  number  S]  even  though  this  number 
can  be  quite  large),  then  we  must  solve  the  finite  set  of  first-oruer, 
linear  differential  (Kolmogorov)  equations 

TT'  (t)  =  TT  (t)Q  .  (1) 

For  steady-state  solutions,  we  are  required  to  solve  the  finite  set  of 
linear  algebraic  steady  state  equations. 


0  =  ]TQ  ,  (2) 

where  tt  =  (tt^,  tt^,  tt^)  is  the  steady-state  probability  vector  and 

0  is  a  row  vector  of  all  zeroes.  In  both  steady-state  and  transient 
cases  we  have  the  further  condition  that  the  probabilities  sum  to  one, 
namely, 


1  =  Tr(t)e  -  JTe  , 

where  e  is  a  column  vector,  with  all  components  equal  to  1. 
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We  are  often  interested  in  what  happens  to  such  systems  in  a  time- 


varying  environment.  For  example,  a  sudden  increase  in  effort  (say  a 
peacetime  to  wartime  shift)  may  cause  a  sudden  decrease  in  MTTF.  In 
such  situations,  it  is  necessary  to  have  tt ( t )  ,  and  we  must  solve  the 
finite  set  of  first  order  linear  differential  equations  given  in  (1). 
Except  for  very  small  systems  (one  or  two  states)  analytical  tech¬ 
niques  such  as  Laplace  transforms  are  intractable.  Since  we  have  a  fi¬ 
nite  set  of  equations,  numerical  methods  can  be  employed.  Numerical  in¬ 
tegration  schemes  such  as  Runge-Kutta  or  predictor-corrector  methods 
are  possibilities.  We  choose  a  different  approach,  however,  which  is 
referred  to  by  some  as  randomization,  and  has  been  shown  to  be  more  ef¬ 
ficient  for  these  kinds  of  problems  [see  Arsham,  Balana,  and  Gross  (1983) 
or  Grassmann  (1977a)].  For  details  on  this  technique,  which  can  be 
derived  by  a  probabilistic  argument  when  viewing  the  CTMP  in  a  certain  way 
see  Grassmann  (1977a  and  b)  or  Gross  and  Miller  (1984a  and  b) . 

The  computational  formulas  are  as  follows.  Consider  a  discrete 
time  Markov  chain  (DTMC)  with  single-step  transition  probability  matrix 

P  -  Q/A  +  I  , 


where 


A  «  max  | q 


li ' 


that  is,  A  is  the  maximum  of  the  absolute  values  on  the  diagonal  of 
the  Q  matrix.  Since  a  diagonal  element  of  Q  is  the  negative  of  the 
sum  of  the  other  elements  in  the  row  (rows  of  the  Q  matrix  sum  to 
zero),  A  is  actually  the  absolute  value  of  the  minimum  (largest 
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negative)  diagonal  element  of  the  matrix.  This  DTMC  is  referred  to  as 

(k) 

a  uniformized  embedded  DTMC  of  the  CTMP.  Denoting  by  £  '  the 

state  probability  vector  of  this  DTMC  after  k  transitions,  it  can  be 
shown  (see  the  above  cited  references)  that 


TT  (t)  =  l  $ 


k=0 


(k)  (At)k  e~At 
j  k! 


For  computational  purposes,  it  is  necessary  to  truncate  the  infinite 
sum.  The  truncation  error  can  be  easily  bounded  since  we  are 
discarding  a  Poisson  "tail,"  so  that  the  computational  formula 

becomes 


TT.  (t) 


T(t,e) 

1 


k=0 


<J> 


(k)  (At)' 


k! 


-At 

e 


(3) 


where 


T(t,e)  =  min 


N 

1 

n=0 


e-At(flt)n 

n! 


>  1  - 


£  being  the  maximum  tolerable  error  (specified  by  the  user) .  One 
advantage  of  this  method  over  numerical  integration  is  an  exact  bound 
on  the  computational  error. 

The  major  computational  effort  in  using  (3)  is  now  reduced  to 

(k) 

finding  the  state  probability  vector,  £  *  of  the  uniformized  em¬ 

bedded  DTMC.  This  can  be  readily  accomplished  by  the  usual  recursion, 


$(0)  =  tt(0)  ;  $(k+1)  =  £(k)P  .  (4) 

Cross  and  Miller  (1984a)  give  a  more  efficient  procedure  than  the  suc¬ 
cessive  vector-matrix  multiplication  of  (4),  which  takes  advantage  of 
the  sparsity  of  the  P  matrix. 
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3.  Steady-state  Environment 


Solving  for  the  steady-state  probability  vector  tt  requires 
solving  the  set  of  linear  algebraic  equations  of  (2) .  Since  one  of  these 
equations  is  redundant,  it  is  necessary  to  reduce  the  equation  set  by 
one  and  use  1  =  TTe  as  the  final  equation.  Thus  (2)  can  be  reformulated  as 

b  *  7TA  ,  (5) 

where  b  is  a  vector  of  all  zeroes,  except  for  the  last  element,  which 
is  a  1,  and  A  is  the  Q  matrix  with  the  last  column  replaced  by  lTs. 

For  relatively  small  systems,  the  solution  can  be  obtained  by 
inverting  A  to  get 


TT  =  bA  1  . 

However,  for  most  realistic  problems,  the  state  space  (and  hence  dimen¬ 
sion  of  the  A  matrix)  is  too  large  to  obtain  A  ^  efficiently  or 
accurately.  This  situation  suggests  iterative  procedures  such  as 
Jacoby  or  Gauss-Seidel . 

Consider  the  A  matrix  as  a  sum, 

A  =  L  +  D  +  U  , 

where  L  is  a  lower  triangular  matrix,  D  is  a  matrix  with  only  diagonal  el< 
ments,  and  U  is  an  upper  triangular  matrix.  Then  (5)  can  be  written  as 

tt  (L+D+U)  =*  b 


or 


ttD  =  b  -  tt(L-HJ)  . 

We  can  use  (6)  in  an  iterative  fashion, 


TT^n+1)D  =  b  -  7T(n)  (L+U)  , 


(6) 


(7) 
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where  we  begin  the  procedure  with  some  initial  guess,  say  .  This 

procedure  is  called  Jacoby  iteration.  Note  that  in  performing  the  cal¬ 


culations,  since  D  is  a  diagonal  matrix,  we  compute  TT 


(n+1)  _ (n+1) 


^(n+l)  ^  ^  successively.  If,  as  we  compute  the  'TT^n+'^  ,  we  replace 

the  TT^n^  on  the  right-hand  side  [e.g.,  in  computing  5  the 

(n)  .  ,  ,  ,  (n)  r  (n+1)  (n+1)  (n+1) 

it  vector  is  modified  to  be  tt  =  nr,,  ,  tt.  .....  tt.  ,  , 

—  —  v  0  1  j-1 

. ..,  *  this  procedure  is  referred  to  as  Gauss-Seidel 

iteration,  and  in  matrix  representation  is 

(UT+D)Tr^n+1^  =  b  -  LTtt^  ,  (8) 

T  T 

where  7 r  and  b  are  now  column  vectors,  and  U  ,  L  are  the  trans¬ 
poses  of  U  and  L  ,  respectively. 

Two  questions  remain  to  be  answered  concerning  use  of  the  itera¬ 
tive  procedures  of  (7)  or  (8);  namely,  (i)  do  the  procedures  converge, 
and  (ii)  when  should  the  iterations  be  terminated?  In  general,  these 
procedures  may  not  necessarily  converge,  although  for  our  well-structured 
Markov  process  convergence  will  take  place.  The  stopping  criterion  gen¬ 
erally  used  is  the  Cauchy  criterion,  namely,  stop  when 

max  7T^n+1^  -  ir{n^  <  en  ,  (9) 

l  l  0  7 

l 

where  is  an  "arbitrarily"  chosen  small  number.  We  found  using  the 

fractional  difference  version  of  (9),  namely,  stop  when 


(n+1)  (n) 

TT  .  -  TT  . 

1  1 


<  » 


to  be  somewhat  more  effective.  While  there  has  been  some  success  in 


using  Gauss-Seidel  (G-S)  on  Markov  models  [see  Kaufman,  Gopinath,  and 
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Wunderlich  (1981)],  problems  exist  with  respect  to  rate  of  convergence 
and  appropriate  stopping  criteria.  The  G-S  convergence  rate  can  often  be 
improved  by  using  overrelaxation,  that  is,  by  weighting  with  a  coeffi¬ 


cient  greater  -than  one  the  tt 


(n+l)  (n+l) 


(n+l)  ,  .  , 

n  ,  tt.  ,  ...,  tt.  _  used  in  calcu- 
0  1  j-1 

lating  [see  Kaufman,  et  al .  (1981)  or  tlaron  (1982)]. 


Usually,  the  G-S  procedure  is  applied  to  a  set  of  equations  with 
a  nonsingular  matrix  (such  as  A  ) .  Consider  a  nonsingular  matrix  M 
with  positive  diagonal  elements  arid  negative  off-diagonal  elements. 

The  G-S  procedure  is  known  to  converge  for  sets  of  equations  with  such 
an  M  matrix  [see  Varga  (1963)].  Now  consider  equation  set  (2),  namely, 


0  =  ttQ  . 

Multiplying  through  by  -1  gives 

o  -  TT  t-Q  ]  , 

where  -Q  has  positive  diagonal  elements  and  negative  off-diagonal  ele¬ 
ments.  However,  it  is  singular,  since  one  equation  of  this  set  is  re¬ 
dundant.  Suppose  we  arbitrarily  set  tt  (assuming  there  are  S  states) 

O 

to  one,  remove  the  last  row  of  the  Q  matrix  (call  this  reduced  matrix 
Q  ) ,  and  consider  solving  the  reduced  S-l  x  S-l  set  of  equations 

0  =  ]T  [-Q  ]  . 

Now  -Q  is  an  M  matrix  and  convergence  is  guaranteed.  Of  course  the 
resulting  tk  values  are  relative  to  Trg  =  1  so  that  they  must  be  renor- 
malized  by  dividing  each  by  tt  .  How  fast  convergence  takes  place 

still  is  a  key  question,  however.  It  turns  out  [see  Kaufman,  et  at. 

(1981)]  that  working  with  the  full  Q  matrix,  even  though  it  is  singu¬ 
lar,  speeds  convergence,  and  this  is  what  we  also  do. 
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Another  procedure  is  to  use  the  uniformized  embedded  DTMG  of  the 
randomization  procedure  vzith  transition  probability  matrix  P  =  Q/A  +  I  . 
This  Markov  chain  has  limiting  probabilities  given  by 

$  =  &  >  (ID 

and  they  are  identical  to  the  rr  of  the  CTMP  we  seek  [<j>  =  <j>P  ^  - 

(Q/ A)  +-x])  =*>  0  =  (j>(Q/A)  =*>  0  =  <j>Q  =  0  -  ttQ  ].  Solving  the  set  of 
equations  given  by  (11)  is  no  easier,  of  course,  than  solving  that  of 
(5).  However,  we  know  from  Markov  chain  theory  that  limiting  probabil¬ 
ities  of  a  DTMC  can  be  found  by  iteration,  namely, 


t r 


(n+1) 


(12) 


Here  again,  we  have  computational  problems  associated  with  iteration, 
but  we  know  from  Markov  chain  theory  that  convergence  is  guaranteed  due  to 
the  existence  of  a  steady  state  vector  7T  (the  P  matrix  is  irreducible) * 
The  problem  of  when  to  stop  the  iterations  remains,  however.  Using  the 
Cauchy  criterion  here  results  in  problems  similar  to  those  found  when 
using  it  for  G-S  iteration,  namely,  successive  probabilities  can  differ 
by  very  small  amounts  and  still  be  far  from  the  steady  state  values, 

Wallace  and  Rosenberg  (1966)  provide  a  considerably  better  stop¬ 
ping  criterion  than  the  Cauchy  criterion  of  (10) ,  Their  stopping 
rule  is  based  on  estimating  the  rate  of  convergence  by  estimating 
the  second  eigenvalue  of  P  ,  and  turns  out  to  be:  'hStop  when 


(n+1)  .  (n) 

ir  .  ~  .?r. ..  J 

1 

(n+1) 

(tl; 

► 

U/n 

1  - 

7T 

“IT 

J 

<  e 


0  ’ 


(13) 


For  details  of  this  development,  see  Wallace  and  Rosenberg  (1966)  or 
Gross,  Kioussis,  Miller,  and  Soland  (1984). 
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4,  Results 

The  following  section  gives  a  brief  summary  of  results  to  date. 
For  greater  detail,  we  refer  the  reader  to  Gross  and  Miller  (1984b)  and 
Gross,  Kioussis,  and  Miller  (1984)  for  the  transient  case  and  to  Gross, 
Kioussis,  Miller,  and  Soland  (1984)  for  the  steady-state  case. 

4.1  Transient  Case 

The  largest  system  solved  to  date  using  equation  (3)  directly 
was  a  (2,2,2)  system  (as  pictured  in  Figure  1)  with  18  components  at 
base  1  (of  which  4  were  spares) ,  13  at  base  2  (of  which  3  were  spares) , 
and  3  spares  at  the  depot.  The  base  repair  shops  had  2  parallel  service 
channels  each,  and  the  depot  repair  facility  had  four.  This  gave  a 
state  space  of  20,748  (Q  =  20,748  x  20,748). 

The  time-varying  environment  scenario  is  shown  in  Figure  2.  At 
time  6  ,  a  shift  in  MTTF  (l/X)  occurs  but  it  takes  until  time  10  for 
the  repair  facilities  to  "catch  up"  in  MTTR  (1/y) .  This  simulates  a 
change  in  usage  due  to,  say,  a  shift  from  peacetime  to  wartime.  The 
measure  of  effectiveness  calculated  is  the  availability  at  time  t 
(t  -  1,2,..., 15)  ,  where  availability  is  defined  as  follows: 

A^(t)  E  Pr{at  least  14  components  are  operational  at  base  1  at 
time  t} 

A2(t)  =  Pr {at  least  10  components  are  operational  at  base  2  at 

time  t} 

Aj2(t)  ^  Pr{at  least  14  components  at  base  1  and  at  least  10 

components  at  base  2  are  simultaneously  operational 
at  time  t}  . 
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Figure  2.  Time-varying  environment  scenario  for  sample  run. 


Figure  3  shows  a  plot  of  A^(t)  versus  t  .  Plots  of  A^(t)  and 

A^(t)  are  similar  in  nature.  The  graph  shows  an  initial  A^(0)  of  1.0 

(we  assume  at  time  zero  all  components  are  operational)  and  thereafter 

a  drop-off  toward  the  steady-state  availability  as  time  increases.  At 

time  6,  the  increase  in  failure  rate  occurs  and  A^(t)  begins  to  drop 

off  at  a  higher  rate,  heading  for  a  new,  lower  steady-state  availability. 

However,  the  increase  in  repair  rate  at  time  10  causes  A^(t)  to  begin 

to  rise,  heading  back  toward  the  original  steady-state  availability.  This 

run  took  approximately  25  minutes  of  CPU  time  on  a  VAX  11/780  computer 

using  the  randomization  computation  of  (3)  with  the  efficient  procedure 

(k) 

given  in  Gross  and  Miller  (1984a)  for  calculating 

As  the  systems  become  more  complex  (more  bases,  multiple  component 
types,  indenture,  more  echelons,  etc.)  the  state-space  grows  rapidly.  We 
have  solved  a  problem  with  three  bases,  yielding  a  state-space  of  size 
43,278,703,  by  truncating  the  state-space  ("lumping"  low  probability 
states  into  several  absorbing  states  resulting  in  a  truncated  state  space 


r  . 
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Figure  3.  A^(t)  versus  t  for  sample  run. 
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of  approximately  15,000  states)  on  i he  VAX  11/780  in  approximately  30 
minutes  [see  Gross,  Kioussis,  and  Miller  (1984)], 
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4  .  2 _ S t  e a dy- state  Case 

Ironically,  computational  success  has  been  far  more  elusive  for 
the  steady-state  situation  than  for  the  transient  case.  The  problem  is 
the  stopping  criterion  for  these  iterative  procedures  (a  problem  not 
present  when  dealing  with  transient  solutions) .  In  the  transient  case, 
the  randomization  procedure  guarantees  an  accuracy  to  within  a  pre¬ 
specified  £  .  For  steady  state,-  using  either  the  Cauchy  or  the 
Wallace-Rosenberg  stopping  rule  does  not  guarantee  errors  within  £  . 
Table  1  shows  some  computations  for  a  (1,1,1)  system  which  is  the 
classical  machine  repair  with  spares  model  of  queueing  theory.  For  this 
model  the  availability  can  be  computed  analytically,  which  allowed 
us  to  estimate  the  actual  error.  The  columns  under  P -TO  show  the  results 
of  using  (12)  with  the  stopping  criterion  of  (13) ,  the  Wallace-Rosenberg 
approach,  while  the  GS-C  columns  show  results  for  (8)  with  the  stopping 
criterion  of  (10),  the  Gauss-Seidel  approach. 

The  circled  elements  show  the  cases  for  which  the  error  specifi¬ 
cation,  £  ,  was  exceeded.  While  there  were  more  cases  of  exceeding  the 
stopping  rule  error  specification  in  P-WR,  the  error  excesses  were 
larger,  especially  for  the  larger  population  cases,  under  GS-C.  But 
GS-C  stopped  in  far  fewer  iterations  in  almost  all  cases  (except  for  the 
very  small  population  cases) ,  and  it  is  the  number  of  iterations  that 
consumes  most  of  the  CPU  time. 

The  last  column  shows  a  rerun  of  GS-C,  ignoring  the  stopping  cri¬ 
terion  and  performing  the  same  number  of  iterations  as  used  for  the  P-WR 
procedure.  The  errors  essentially  went  to  zero,  which  indicates  that 
if  a  better  stopping  criterion  could  be  found,  Gauss-Seidel  iteration 
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RESULTS  FOR  (1,1,1)  MODEL 


^Percentage  of  time  the  population  is  operating  at  full  strength 


T-490 

might  be  a  viable  approach.  Runs  for  some  (2,2,2)  systems  and  more 
detailed  discussion  of  these  steady-state  procedures  can  be  found  in 
Gross,  Kioussis,  Miller,  and  Soland  (1984). 
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STUDYING  MODEL  ASSUMPTIONS 
IN  PROCESS  QUALITY  CONTROL 


C.  P.  Quesenberry 
North  Carolina  State  University 
Raleigh,  NC  27695-8203 

1.  INTRODUCTION 

In  process  quality  control  by  variables  we  typically  assume  that  a 

sequence  of  observations  ,  X^ ,  .  ..,  X^,  ...  is  observed  in  time.  The 

X^'s  themselves  are  often  batch  means.  The  basic  model  usually  assumes  that 

the  X^'s  are  independently  and  identically  distributed  normal  random  variables, 

2 

i^.e_.,  are  i.i.d.  N(p,  a  )  r.v.'s.  Some  of  the  most  widely  used  methods  of 
process  quality  control,  such  as  Shewhart  charts  for  sample  means  and  ranges 
and  CUSUM  tests  for  sample  means,  are  designed  to  detect  changes  in  the 
process  that  cause  either  u  or  a  to  shift  to  different  values.  However,  the 
observed  data  contain  much  information  that  can  be  used  for  purposes  other 
than  detecting  changes  in  the  mean  and  variance  of  an  assumed  normal  model. 

In  this  talk,  we  shall  propose  computing  certain  statistics  which  capture 
all  of  the  information  in  the  data,  beyond  that  in  the  sample  mean  and 
variance,  and  consider  methods  of  using  these  statistics  to  detect  a  number 
of  types  of  violations  of  the  basic  model  assumptions. 

The  statistics  which  we  propose  to  compute  are  called  sequential  uni- 

* 

form  residuals,  due  to  the  fact  that  they  have  known  uniform  distributions 
when  the  normal  model  is  correct.  These  residuals  are  not  new,  but  have 
been  derived  and  studied  by  the  present  speaker  and  co-workers  in  earlier 
work,  however,  they  have  not  been  considered  in  the  context  of  process 
quality  control.  We  feel  that  they  have  excellent  potential  for  many 
useful  applications  in  this  area. 
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2.  UNIFORM  AND  NU  RESIDUALS 


Let  X^,  X^,  Xn>  ...  denote  a  sequence  of  random  variables,  and 

O 

when  these  are  i.i.d.  N(y,os)  random  variables  we  shall  say  that  normal 
model  assumptions  hold.  We  define  the  following  (sequential)  sample 
quantities : 

*r  ■  Sr  ■  y*)  -  V2/r- 

(to  compute  these  sequentially,  see  Youngs  and  Cramer,  1972) 

*r  ■  [<r  -  *  Xr]/r,  S*  =  <^>8^  *  <^)(Xr  -  \)2 

Ar  =  [r(r  -  2)]?S(Xr  -  \) / {r  - 


u  0  =G  0(A  );  r  =  2,  3,  ....  n,  ... 

When  G  ( • )  is  a  Student-t  distribution  function  with  v  degrees  of  freedom. 

If  the  normal  model  assumptions  hold,  then  the  values  u^,  ••• 

were  shown  in  0* Reilly  and  Quesenberry  (1973)»0-Q,  to  be  i.i.d.  uniform 

random  variables  on  the  unit  interval.  A  careful  examination  of  the  value 

A  above  shows  that  it  is  a  Studentized  value  of  the  residual  of  the  r*'*1 
r 

observed  value  from  the  mean  of  the  first  r  values.  For  this  reason,  we 
shall  call  the  value  u^  a  uniform  residual .  Our  purpose  in  this  paper  is 
to  consider  using  these  uniform  residuals  and  other  related  quantities  in 

methods  for  studying  the  model  assumptions  in  statistical  process  control. 

* 

These  uniform  residuals  were  first  derived  in  0-Q  by  the  method  of 
conditional  probability  integral  transformations  introduced  by  those 
authors.  These  residuals  can  also  be  obtained  as  a  special  case  of  the 
uniform  residuals  from  regression  models  with  normal  error  structure  as 
given  in  0-Q,  and  considered  also  in  Quesenberry  (1983)  and  Hester  and 
Quesenberry  (198U).  In  the  present  work,  we  shall  discuss  some  ways  in 


which  these  sequential  uniform  residuals  can  be  used  in  statistical  process 
control,  however,  we  first  set  out  some  basic  properties  of  these  statistics 
which,  we  feel,  motivate  the  use  of  these  quantities.  When  normal  model 
assumptions  hold,  the  uniform  residuals  have  the  following  properties: 

Property  1.  The  quantities  m  ,  u0,  u  ,  are  i.i.d.  uniform  random 

variables  on  the  unit  interval  (0,1).  (0-Q,  1973). 

Property  2.  The  vector  =  (u^,  u^)'  independent  of  the  vectors 

$r  =  (X3.  ,  X.) •  and  g2r  =  ( S2 ,  ...,  sj)’  for  all  r  =  2,  3 . (This 

follows  from  the  completeness  and  sufficiency  of  T^  =  (X^,  S^)',  and  a  well-known 
result  of  Easu  (1955). ) 

Property  3.  The  uniform  residuals  vector  is  a  maximal  invariant  with 
respect  to  linear  transformations  of  the  data.  (Quesenberry  and  Starbuck 
(1976),  Q-S). 

Property  U .  If  a  most  powerful  similar  or  a  most  powerful  invariant  (re 
to  linear  transformations)  test  exists  for  testing  the  normal  model  against 
any  particular  alternative,  then  a  test  of  equal  power  can  be  based  on  the 
uniform  residuals.  (Q-S). 

Let  <t>(  •  )  denote  the  distribution  function  of  a  N(0,l)  distribution,  and 
#  •  )  its  inverse  function.  Then  we  define  NU  (normal-uniform)  residuals 

Zj  =  4>_1(uj )  ,  j  =  1,  2,  . .  • 

and  z^,  ...  are  i.i.d  N(0,l)  r.v.'s,  when  the  normal  model  holds. 

For  some  problems,  there  are  advantages  in  considering  the  NU  residuals 
rather  than,  or  in  addition  to,  the  uniform  residuals. 
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3.  STUDYING  THE  RESIDUALS 


In  view  of  the  properties  of  the  sequential  uniform  residuals 
discussed  in  section  2,  the  principal  task  is  to  find  effective  ways  to 
study  these  values  in  order  to  detect  deviations  from  model  assumptions. 

We  can  consider  two  basic  settings,  which  present  somewhat  different 
problems.  First,  we  can  consider  the  analysis  of  residuals  for  a  set  of 
past  data,  and,  second,  we  can  consider  the  sequential  analysis  of  data 
as  it  is  observed  in  time.  In  each  of  these  cases,  we  may  wish  to  use  the 
data  to  attempt  to  identify  different  types  of  model  misspecification  which 
can  occur.  There  are  many  different  types  of  problems  which  can  be 
considered  and  we  must  limit  our  discussion  here  to  a  few  particular 
problems.  In  this  section  we  consider  the  simple  but  useful  method  of 
graphing  the  ufs  in  order  to  look  for  patterns,  and  then  suggest  tests  for 
uniformity  on  the  uniform  residuals  and  of  normality  on  the  NU  residuals 
as  general  methods  of  analysis  to  detect  anomolies  in  the  data.  In  the 
next  section,  we  suggest  using  these  residuals  to  detect  outliers. 

We  consider  first  some  methods  for  analyzing  the  uniform  and  NU 
residuals  that  are  designed  to  detect  a  wide  range  of  deviations  from  the 

basic  normal  model.  Since  these  methods  are  expected  to  perform  reasonably 

% 

well  in  detecting  a  large  class  of  alternatives,  we  cannot  expect  them  to 
be  most  effective  against  particular  restricted  alternatives.  If  we  wish 
to  focus  on  a  particular  alternative,  then  it  may  be  possible  to  find  a 
test  or  other  analysis  technique  which  is  especially  sensitive  for  detecting 
it.  In  this  section,  we  shall  consider  analyses  based  on  plotting  the  re¬ 
siduals  and  computing  some  omnibus  goodness-of-fit  tests. 
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As  a  first  analysis,  we  suggest  plotting  the  uniform  (or  NU)  residuals 
against  the  index  and  studying  these  plots  for  trends.  Due  to  the  recursive 
property  of  these  residuals,  the  plots  can  be  made  sequentially  when  each 
observation  is  taken,  in  order  to  identify  problems  as  early  as  possible. 
Under  the  null  hypothesis,  the  uniform  residuals  should  tend  to  form  a  uni¬ 
form  band  between  the  lines  u  =  0  and  u  =  1.  The  possible  types  of 
patterns  that  indicate  model  specification  errors  is  large.  Indeed,  any 
recognizable  pattern  among  these  points  will  likely  require  further  study. 
The  type  of  misspecification  that  leads  to  a  particular  point  pattern  can 
sometimes  be  deduced  by  recalling  the  nature  of  the  transformations  in  (2.1) 
viz . ,  that  Aj  is  a  Studentized  residual  and  Uj  is  obtained  by  transforming 

A  with  the  appropriate  Student-t  distribution  function. 

J 

The  uniform  and  NU  residuals  are  equivalent  statistics,  and  contain 
the  same  information;  however,  some  patterns  or  anomalies  will  be  more 
apparent  in  the  graphs  of  one  or  the  other  of  these  types  of  residuals.  One 
instance  of  this  is  in  detecting  outliers.  The  plots  of  NU  residuals 
will  display  outliers  more  clearly  than  will  the  plots  of  uniform  residuals. 

The  detection  of  outliers  will  be  considered  in  Section  h . 

An  Omnibus  Test  for  Uniformit 


% 

One  type  of  test  statistic  which  we  shall  often  want  to  compute  is  an 
omnibus  test  of  simple  uniformity  on  the  values  of  j£.  Such  omnibus  tests 
have  reasonably  good  power  against  a  wide  range  of  alternatives.  There  are 
a  large  number  of  omnibus  goodness-of-fit  tests  which  can  be  used  to  make 
tests  for  uniformity.  Reasonably  extensive  power  studies  of  tests  for  uni¬ 
formity  have  been  made  by  Quesenberry  and  Miller  (1977)*  Q-M,  and  by  Miller 
and  Quesenberry  (1979),  M-Q.  These  papers  review  much  of  the  literature  in 
this  area  of  goodness-of-fit  testing.  Based  upon  the  results  in  these  papers. 


we  recommend  the  Neyman  smooth  test , 
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Neyman  (1937)  posed  a  statistic  designed  to  have  high  power  for  test¬ 
ing  uniformity  against  certain  classes  of  alternative  distributions  on  the 


unit  interval.  (See  M-Q  and  Kendall  and  Stuart  (1961),  p.  UUU . )  The  test  is 
defined  as  follows.  The  Legendre  polynomials  are  given  by;  for  r  =  0, 

1 ,  2 *  3,  U ;  and  0  <_  y  <_  1 ; 

»0(y)  =  l.n-Jy)  =  ^12(y~h) . 

112(y)  =  /T[6(y-H)2  -  H],  *g(y)  =  /f[20(y-H)3  -  3(y-?s)],  (3- 1 ) 

TT^(y)  =  210{y-h)]*  -  U5(y-%)2  +  9/8. 

Then  put 


N 

t  =  E  IT  (u  )  for  r  =  1,  2,  3, 
J=1  J 


and  the  Neyman  smooth  test  rejects  for  large  values  of  the  statistic 

p2  =  ( l/N)  (t2  +  t2  +  t2  +  t2).  (3.2) 

2  2 

Neyman  showed  that  p^  has  a  limiting  x  (*0  distribution  when 
u^ ,  .  u^  are  i.i.d.  uniform  r.v.'s.  Computations  of  upper  .1,  .05,  and 

.01  percentage  points  in  M-Q  indicate  that  this  approximation  is  reasonably 

good  for  N  as  small  as  ten,  or  even  smaller  in  some  cases.  This  approximation 

is  particularly  convenient  because  it  can  be  used  to  determine  the  observed 

significance  level  or  p-value  of  the  test  for  uniformity.  IJms ,  in  order 

to  obtain  an  overall  assessment  of  the  validity  of  the  normal  model,  we 

2 

compute  the  Uj ’s  from  (2.1),  the  value  of  p^  from  (3*2),  and  then  evaluate 

NS4_PV  =  p-value  =  P(x2(1+)  >  p2}.  (3.3) 

For  0  <  a  <  1,  if  NS^_PV  a,  then,  of  course,  we  reject  the  norma] 
model  at  the  a  level.  In  practice,  we  compute  NSl4_PV  and  view  it  as  a 
general  coefficient  of  validity  of  the  normal  model. 
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Tests  of  Normality  on  NU  Residuals 


In  addition  to  (or  in  place  of)  omnibus  tests  for  uniformity,  we  can  test 
the  normality  of  the  NU  residuals  z^9  .  ..,  z^T,  defined  at  the  end  of 
Section  2.  There  are  many  good  tests  of  normality  available  today,  and  there 
is  no  strong  reason  to  favor  a  particular  one.  We  slightly  prefer  the 
Anderson-Darling  test.  Two  points  should  be  noted  in  this  context.  Although 
Zly  ***’  ZN  ^  =  n-2)  are  i.i.d.  N(0,l)  when  the  normal  assumptions  hold, 
it  has  been  shown  by  Stephens  (197*0  and  independently  by  Dyer  (197*0  that 
statistics  for  testing  composite  normality  often  have  better  power  for 
testing  simple  normality  than  tests  designed  to  test  the  simple  normality 
null  hypothesis.  Another  point  that  should  be  noted  is  that  the  most  popular 
tests  for  normality  do  not  have  solved  distribution  theory  that  allows  the 
exact  determination  of  p-values  of  the  tests.  We  feel  thi3  is  a  considerable 
disadvantage  for  these  tests. 

*4.  DETECTING  OUTLIERS 

In  many  process  control  problems,  an  occasional  observation  will  appear 
which  is  either  much  larger  or  smaller  than  its  fellows.  The  question  then 
is  how  one  is  to  decide  when  an  observation  is  an  "outlier11  and  when  it  is 
a  feasible  value  under  the  normal  model  assumptions.  The  exact  distribu¬ 
tion  theory  of  uniform  residuals  provides  an  especially  simple  and  elegant 
solution  for  the  problem  of  detecting  outliers. 

We  shall  declare  an  observation  a  left  outlier  if  it  is  too  small  and 
thus  its  uniform  residual  is  too  near  zero,  and  we  call  it  a  right  outlier 
if  it  is  too  large  and  its  uniform  residual  is  thus  too  near  one.  In  view 
of  the  nature  of  sequential  uniform  residuals,  this  means  that  an  observa¬ 
tion  is  called  a  left  outlier  if  it  is  too  small  when  compared  with  the 
observations  preceding  it,  and,  similarly,  it  is  a  right  outlier  if  it  is 
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too  large  in  comparison  with  the  values  preceding  it.  Thus  these  might 
be  called  sequential  outliers . 

Suppose  that  we  are  willing  to  incorrectly  decide  that  observations 
are  too  large  and  that  they  are  too  small  each  at  the  rate  of  1  in  Nq 
observations,  when  the  normal  model  is  correct.  Then  we  apply  the  follow¬ 
ing  rejectiqn  (identification)  rule: 

Declare  x  a  left  outlier  if  u  „  <  l/N  , 
r  r-2  o 

Declare  x r  a  right  outlier  if  ^  >  (N^-l)/!^.  (^.1) 

The  overall  rejection  rate  is  2/Nq,  when  the  model  is  correct.  This  is 
a  reasonable  procedure  for  screening  observations  as  they  arrive  sequentially. 
This  procedure  is,  of  course,  equivalent  to  rejecting  individual  observa¬ 
tions  as  outliers  if  they  fall  outside  the  lines  u  =  l/N  and  u  =  (N  -l)/N 

o  o  o 

on  the  graph  discussed  in  Section  3. 

If  we  wish  to  decide  if  a  sequence  u^ ,  u^  of  residuals  from  past 

data  contains  outliers  we  can  apply  the  rule  above  to  perform  tests,  or  we 
can  appeal  directly  to  the  distribution  theory  for  order  statistics  from 


a  uniform  distribution. 


Let  u(l)  and  u(n) 


denote  the  smallest  and  largest 


values  among  the  residuals,  respectively.  Moreover,  let  p  denote  the 
p— value  for  testing  that  the  point  associated  with  u^  ^  is  a  left  outlier, 
and  PR  denote  the  p-value  for  testing  that  the  point  corresponding  to 
is  a  right  outlier.  When  the  normal  model  holds  these  values  are  given  by 


PL=1‘  (1  -“(!)>  •  PR  =  1  -  u(„)  ■ 


ft. a 


The  use  of  these  formulas  will  be  illustrated  with  numerical  examples 


in  the  following  section. 
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5.  NUMERICAL  EXAMPLES 


To  illustrate  the  techniques  discussed  above,  we  have  computed  the  uni 
form  residuals  for  a  number  of  data  sets.  A  random  sample  of  size  50  was 
generated  from  each  of  four  distributions  and  the  uniform  residuals  were 
plotted  against  the  index.  Samples  were  drawn  from  a  normal,  exponential, 
uniform  and  Cauchy  distribution.  The  graphs  for  these  samples  are  given  in 
the  following  Figures  1  -  U.  The  p-values  NSU  PV,  P  and  PD  are  given  in 
Table  1. 

The  plot  of  the  uniform  residuals  for  the  normal  sample  in  Figure  1 
shows  no  anamolous  patterns  and  the  p-values  are  easily  in  the  acceptance 
range . 

The  graph  for  the  exponential  sample  in  Figure  2  does  show  important 
patterns  that  indicate  a  nonnormal  distribution.  There  are  ncD  observa¬ 
tions  very  near  zero  -  which  is  a  reflection  of  the  fact  that  the  normal 
density  is  positive  on  the  negative  reals  but  the  exponential  density  is 
zero  on  the  negative  reals.  Note  that  the  p-values  are  all  suspect.  The 
Neyman  smooth  statistic  p- value  is  0.01386,  P^  =  0.99981  is  too  large ,  and 
PL  =  0.02536. 

The  analysis  of  the  uniform  sample  shows  a  value  of  NS4_ PV  =  0.06839, 

which  is  suspect,  and  a  value  of  PD  that  is  too  large,  again.  %  The  graph  in 

n 

Figure  3  shows  no  points  very  near  zero  or  one,  which  is  a  reflection  of 
the  fact  that  the  uniform  density  has  thinner  tails  than  the  normal  density 

The  Cauchy  sample  is  easily  rejected  by  the  goodness-of-fit  test,  and 
its  tendency  to  throw  outliers  is  evident  in  the  p-values  of  the  order 
statistics . 

Finally,  we  computed  the  uniform  residuals  and  the  p-values  for  the 


data  given  by  Ott  (1975)*  Note  that  in  Figure  5  for  this  data  the 
points  display  a  rising  trend  beginning  at  about  the  76th  or  77th  original 
data  points.  This  is  due  to  a  trend  in  the  data  discussed  by  Ott.  Also, 
both  P.  and  PR  are  significant  at  the  0.05  level. 
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TABLE  1:  GOODNESS  OF  FIT  AND  OUTLIERS  P-VALUES 


Parent 

P 

P 

Distribution 

NSl*_PV 

L 

PR 

Normal 

Exponential 

Uniform 

Cauchy 

Ott 


0. U5931 

0.93523 

0Ji9131 

0.01386 

0.99981 

0.02536 

0.06839 

0.50572 

0.98125 

7.95071E-08 

2.06501E-U 

0.02117*' 

0.039851 

0.0114772 

0.01(25*4 , 
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Recent  Research  in  Experimental  Design  for  Quality  Improvement 
with  Applications  to  Logistics 


George  E.  P.  Box 


1,  LOGISTICS  AND  QUALITY  CONTROL 


Important  measures  of  military  competence  such  as  performance  capability 
and  readiness  rate  are  greatly  influenced  by  the  quality  of  the  weapons  and  of 
the  other  devices  available  to  the  soldier, 

A  traditional  philosophy  of  quality  control  has  been  to  "inspect  bad 
quality  out"  and  indeed  there  are  famous  military  standards  that  employ  this 
philosophy.  W.  Edwards  Deming  (1982)  has  likened  this  to  making  toast 
according  to  the  recipe  "you  burn  it  and  I'll  scrape  it",  and  has  urged  the 
alternative  philosophy  of  assuring  that  good  quality  has  been  built  in  to  the 
product  in  the  first  place.  In  particular  he  attributes  to  the  latter 
philosophy  the  success  of  Japanese  industry  in  producing  high  quality  products 
at  low  cost.  A  typical  example  of  the  dramatic  consequences  that  have  been 
attributed  to  these  differences  of  approach  are  the  air-condi tioner  defect 
rates  shown  in  Table  1  and  quoted  by  David  Garvin  (1983). 


(In  the  factory:  Assembly  line  defects  per  100  units) 

American  Japanese 


Total  .  63.5  0.95 

Leaks  .  3.1  0.12 

Electrical  . . . .  3.3  0.12 


(In  the  field:  Service  call  rate  per  100  units  under 
first  year  warranty  coverage) 

American  Japanese 


Total  .  10.5  0.6 

Compressors  .  1.0  0.05 

Thermostats  .  1  .4  0.002 

Fan  motors  . 0.5  0.028 


TABLE  1 .  Defect  rates  in  US  and  Japanese  air  conditioners 


Similar  comparisons  have  been  made  between  defect  rates  in  American  and 
Japanese  automobiles. 

The  same  United  States  industry  that  makes  air  conditioners  and  motor 
vehicles  also  makes  military  hardware.  It  seems  clear  therefore  that  a  major 
change  in  quality  philosophy  could  produce  a  major  improvement  in  the 
reliability  of  the  Army's  equipment.  The  philosophy  of  "building  quality  in" 
employs  a  policy  of  never  ending  quality  improvement  which  may  be  typified  in 


Sponsored  by  the  United  States  Army  under  Contract  No.  DA AG29-R0-C-0  04 1  . 
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terms  of  the  traditional  statistical  model 


y  =  ffx^  +  e 

where  y  is  a  quality  characteristic  believed  to  depend  on  a  set  of  variables 
denoted  by  x^  whose  identity  is  known/  and  e  is  the  difference  y  -  f(x^) 
usually  referred  to  as  error*  (Such  "errors”  are  often  somewhat  arbitrarily 
imbued  by  the  theoretician  with  properties  of  randomness,  normality 
independence  and  homoscedasticity ) .  In  reality  e  is  a  function  e(x^)  a 

number  of  additional  variables,  say,  which  affect  the  system  but  whose 

identity  is  usually  unknown.  In  general,  quality  improvement  is  achieved  by 
transferring  elements  of  the  unknown  factor  vector  x^  into  the  known  factor 
vector  x^  as  indicated  below 


4- - *» 

y  =  f (x1 )  +  e(x2) 

known  unknown 

The  effect  of  such  transfer  is  two-fold 

(i)  to  reveal  effects  of  previously  unknown  factors  which  may  then  be 

adjusted  to  levels  yielding  higher  quality  and/or  used  to  control  the 
process. 

(ii)  to  remove  variation  previously  caused  by  haphazard  changes  in  these 
factors. 

Some  of  the  statistical  techniques  which  contribute  to  this  transfer  are 
quality  control  charting  (including  Shewhart,  Cusum,  Pareto  and  Fishbone 
charts)  and  designed  experimentation  on  line  and  off  line  (employing  in 
different  and  appropriate  contexts  factorial,  fractional  factorial  and 
orthogonal  array  designs,  evolutionary  operation  and  response  surface 
methods ) . 


2.  SCIENTIFIC  METHOD  AND  QUALITY 


Charting  and  experimentation  are  examples  respectively  of  passive 
surveillance  and  active  intervention  both  of  which  are  important  elements  in 
scientific  method  which  it  is  desirable  to  consider  further. 

Humans  differ  from  other  animals  most  remarkably  in  their  ability  to 
learn.  It  is  clear  that  although  throughout  the  history  of  mankind 
technological  learning  has  taken  place,  until  three  or  four  hundred  years  ago 
change  occurred  very  slowly.  One  reason  for  this  was  that  in  order  to  learn 
something  -  for  example,  how  to  make  fire  or  champagne  -  two  rare  events 
needed  to  coincide:  (a)  an  informative  event  had  to  occur,  and  (b)  a  person 
able  to  draw  logical  conclusions  and  to  act  on  them  had  to  be  aware  of  that 
informative  event. 

Passive  surveillance  is  a  way  of  increasing  the  probability  that  the  rare 
informative  event  will  be  constructively  taken  note  of  and  is  exemplified  by 
quality  charting  methods.  Thus  a  Shewhart  chart  is  a  means  to  ensure  that 
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possibly  informative  events  are  brought  to  the  attention  of  those  who  may  be 
able  to  discover  in  them  an  "assignable  cause"  (Shewhart  1931)  and  act 
appropriately. 

Active  intervention  by  experimentation  aims,  in  addition,  to  increase  the 
probability  of  an  informative  event  actually  occurring.  A  designed  experiment 
conducted  by  a  qualified  experimenter  can  dramatically  increase  the 
probability  of  learning  because  it  increases  simultaneously  the  probability  of 
an  informative  event  occurring  and  also  the  probability  of  the  event  being 
constructively  witnessed.  Recently  there  has  been  much  use  of  experimental 
design  in  Japanese  industry  particularly  by  Genichi  Taguchi  (Taguchi  and  Wu 
(1980))  and  his  followers.  In  off-line  experimentation  he  has  in  particular 
emphasized  the  use  of  highly  fractionated  designs  and  orthogonal  arrays  and 
the  minimization  of  variance. 

In  the  remainder  of  this  paper  we  briefly  outline  some  recent  research  on 
the  use  of  experimental  design  in  the  improvement  of  quality. 

3.  USE  OF  SCREENING  DESIGNS  TO  IMPROVE  QUALITY 

Table  2  shows  in  summary  a  highly  fractionated  two-level  factorial  design 
employed  as  a  screening  design  in  an  off-line  welding  experiment  performed  by 
the  National  Railway  Corporation  of  Japan  (Taguchi  and  Wu,  1980).  In  the 
column  to  the  right  of  the  table  is  shown  the  observed  tensile  strength  of  the 
weld,  one  of  several  quality  characteristics  measured. 

The  design  was  chosen  on  the  assumption  that  in  addition  to  main  effects 
only  the  two-factor  interactions  AC,  AG,  AH,  and  GH  were  expected  to  be 
present.  On  that  supposition,  all  nine  main  effects  and  the  four  selected 
two-factor  interactions  can  be  separately  estimated  by  appropriate  orthogonal 
contrasts,  the  two  remaining  contrasts  corresponding  to  the  columns  labelled 
e<|  and  measure  only  experimental  error.  Below  the  table  are  shown  the 

grand  average,  the  fifteen  effect  contrasts,  and  the  effects  plotted  on  a  dot 
diagram.  When  the  effects  are  plotted  on  normal  probability  paper,  thirteen 
of  them  plot  roughly  as  a  straight  line  but  the  remaining  two,  corresponding 
to  the  main  effects  for  factors  B  and  C,  fall  markedly  off  the  line, 
suggesting  that  over  the  ranges  studied,  only  factors  B  and  C  affect 
tensile  location  by  amounts  not  readily  attributed  to  noise. 

If  this  conjecture  is  true,  then,  at  least  approximately,  the  sixteen 
runs  could  be  regarded  as  four  replications  of  a  2^  factorial  design  in 
factors  B  and  C  only.  However,  when  the  results  are  plotted  in  Figure  1 
so  as  to  reflect  this,  inspection  suggests  the  existence  of  a  dramatic  effect 
of  a  different  kind  -  when  factor  C  is  at  its  plus  level  the  spread  of  the 
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To  facilitate  later  discussion  we  have  set  out  the  design  and  labelled  the 
levels  somewhat  differently  from  Taguchi. 
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data  appears  much  larger  than  when  it  is  at  its  minus  level •  Thus,  in 
addition  to  detecting  shifts  in  location  due  to  B  and  C,  the  experiment 
may  also  have  detected  what  we  will  call  a  dispersion  effect  due  to  C.  The 
example  raises  the  general  possibility  pursued  in  the  remainder  of  this  paper 
of  analyzing  unreplicated  designs  for  dispersion  effects  as  well  as  for  the 
more  usual  location  effects* 
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Figure  1 . 


Drying  Period  B 


Tensile  data  as  four  replicates  of  a  2  factorial 
design  in  factors  B  and  C  only. 


Data  of  this  Kind  might  be  accounted  for  by  the  effect  of  one  or  more 
variables  other  than  B  that  affected  tensile  strength  only  at  the  "plus 
level"  of  C  (only  when  the  alternative  material  was  used)*  Analysis  of  the 
eight  runs  made  at  the  plus  level  of  C  does  not  support  this  possibility, 
however. 


4.  RATIONALES  FOR  USING  SCREENING  DESIGNS 


Before  proceeding  we  need  to  consider  the  question ,  "In  what  situations 
are  screening  designs,  such  as  highly  fractionated  factorials,  useful?" 

4.1.  Effect  Sparsity 

A  common  industrial  problem  is  to  find  from  a  rather  large  number  of 
factors  those  few  that  are  responsible  for  large  effects*  The  idea  is 
comparable  to  that  which  motivates  the  use  in  quality  control  studies  of  the 
"Pareto  diagram."  (See,  for  example,  Ishikawa  (1976)).  The  situation  is 
approximated  by  postulating  that  only  a  small  proportion  of  effects  will  be 
"active"  and  the  rest  "inert" *  We  call  this  the  postulate  of  effect 
sparsity .  For  studying  such  situations,  higly  fractionated  designs  and  other 
orthogonal  arrays  (Finney  (1945),  Plackett  and  Burman  (1946),  Rao  (1947), 
Taguchi  and  Wu  (1980))  which  can  screen  moderately  large  numbers  of  variables 
in  rather  few  runs  are  of  great  interest.  Two  main  rationalizations  have  been 
suggested  for  the  use  of  these  designs;  both  ideas  rely  on  the  postulate  of 
effect  sparsity  but  in  somewhat  different  ways. 

4.2.  Rationale  Based  on  Prior  Selection  of  Important  Interactions 

It  is  argued  (see  for  example  Davies  (1954))  that  in  some  circumstances 
physical  knowledge  of  the  process  will  make  only  a  few  interactions  likely  and 
that  the  remainder  may  be  assumed  negligible.  For  example,  in  the  welding 
experiment  described  above  there  were  36  possible  two-factor  interactions 
between  the  nine  factors,  but  only  four  were  regarded  as  likely,  leaving  32 
such  interactions  assumed  negligible.  The  difficulty  with  this  idea  is  that 
in  many  applications  the  picking  out  of  a  few  "likely"  interactions  is 
difficult  if  not  impossible.  Indeed  the  investigator  might  justifiably 
protest  that,  in  the  circumstance  where  an  experiment  is  needed  to  determine 
which  first  order  (main)  effects  are  important,  it  is  illogical  that  he  be 
expected  to  guess  in  advance  which  effects  of  second  order  (interactions)  are 
important. 

4.3.  Projective  Rationale  Factor  Sparsity 

A  somewhat  different  notion  is  that  of  factor  sparsity.  Thus  suppose 
that,  of  the  k  factors  considered,  only  a  small  subset  of  unknown  size  d, 
whose  identity  is  however  unknown,  will  be  active  in  providing  main  effects 
and  interactions  within  that  subset.  Arguing  as  in  Box  and  Hunter  (1961)  a 
two-level  design  enabling  us  to  study  such  a  system  is  a  fraction  of 
resolution  R  =  d  +  1  (or  in  the  terminology  of  Rao  (1947)  an  array  of 
strength  d)  which^produces  complete  factorials  (possibly  replicated)  in 
every  one  of  the  ( ^ J  spaces  of  d  =  R  -  1  dimensions.  For  example,  we  have 
seen  that  on  the  assumption  that  only  factors  B  and  C  are  important,  the 
welding  design  could  be  regarded  as  four  replicates  of  a  2  factorial  in 
just  those  two  factors.  But  because  the  design  is  of  resolution  R  =  3  the 
same  would  have  been  true  for  any  of  the  36  choices  of  two  out  of  the  nine 
factors  tested.  Thus  the  design  would  be  appropriate  if  it  were  believed  that 
not  more  than  two  of  the  factors  were  likely  to  be  "active". 
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For  further  illustration  we  consider  again  the  sixteen-run  orthogonal 
array  of  Table  2  and,  adopting  a  roman  subscript  to  denote  the  resolution  R 
of  the  design,  we  indicate  in  Table  3  various  ways  in  which  that  array  might 
be  used*  It  may  be  shown  that 


(a)  If  we  associated  the  fifteen  contrast  columns  of  the  design  with 

15-11  . 

fifteen  factors,  we  would  generate  a  design  providing  four-fold 

replication  of  22  factorials  in  every  one  of  the  105  two-dimensional 
projections . 

(b)  If  we  associated  only  columns  1,  2,  4,  7,  8,  11,  13,  and  14  with 

g_4 

eight  factors  we  would  agenerate  a  2IV  design  providing  two-fold 
replication  of  23  factorials  in  every  one  of  the  56  three-dimensional 
projections. 


(c)  If  we  associated  only  columns  1,  2, 
we  would  generate  a  2y-^  design  providing  a 
the  four-dimensional  projections. 


4,  8,  and  15  with  five  factors 
24  factorial  in  every  one  of 


(d)  If  we  associated  only  columns  1,  2,  4,  and  8  with  four  factors  we 
would  obtain  the  complete  2^  design  from  which  this  orthogonal  array  was  in 
fact  generated. 


Designs  (a),  (b)  and  (c)  would  thus  be  appropriate  for  situations  where  we 

believed  respectively  that  not  more  than  2,  3,  or  4  factors  would  be 
active  .  Notice  that  intermediate  values  of  k  could  be  accommodated  by 
suitably  omitting  certain  columns.  Thus  the  welding  design  is  a  2m 
arrangement  which  can  be  obtained  by  omitting  6  columns  from  the  complete 
2jjj^  .  Notice  finally  that  for  intermediate  designs  we  can  take  advantage  of 
both  rationales  by  arranging,  as  was  done  for  the  welding  design,  that 
particular  interactions  are  isolated. 


* 

The  designs  give  partial  coverage  for  a  larger  number  of  factors,  for  example 

Q— 4 

(Box  and  Hunter  (1961))  56  of  the  70  four-dimensional  projections  of  the  2jV 
yield  a  full  factorial  in  four  variables. 
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A  discussion  of  the  interative  model  building  process  by  Bex  and  Jenkins 
(1970)  characterized  three  steps  in  the  iterative  data  analysis  cycle 
indicated  below 


identification 


fitting 


diagnostic  checking 


Most  of  the  present  paper  is  concerned  with  model  identification  -  the  search 
for  a  model  worthy  to  be  formally  entertained  and  fitted  by  an  efficient 
procedure  such  as  maximum  likelihood.  The  situation  we  now  address  concerns 
the  analysis  of  fractional  designs  such  as  the  welding  design  in  the  above 
context  when  only  a  few  of  the  factors  are  likely  to  have  effects  but  these 
may  include  dispersion  effects  as  well  as  location  effects. 


5.  DISPERSION  EFFECTS 

We  again  use  the  design  of  Table  2  for  illustration.  There  are  16  runs 
from  which  16  quantities  ~  the  average  and  15  effect  contrasts  —  have  been 
calculated.  Now  if  we  were  also  interested  in  possible  dispersion  effects  we 
could  also  calculate  15  variance  ratios.  For  example,  in  column  1  we  can 
compute  the  sample  variance  for  those  observations  associated  with  a 

minus  sign  and  compare  it  with  the  sample  variance  s* .  for  observations 

■  *  2  2 

associated  with  a  plus  sign  to  provide  the  ratio  =  s^/s-j  +  .  If  this  is 

done  for  the  welding  data  we  obtain  values  for  lnF^*  given  in  Figure  2(a). 

It  will  be  recalled  that  in  the  earlier  analysis  a  large  dispersion  effect 
associated  with  factor  C  (column  15)  was  found,  but  in  Figure  2(a)  the 
effect  for  factor  C  is  not  especially  extreme,  instead  the  dispersion  effect 
for  factor  D  (column  1)  stands  out  from  all  the  rest.  This  misleading 
indication  occurs  because  we  have  not  so  far  taken  account  of  the  aliasing  of 
location  and  dispersion  effects.  Since  sixteen  linearly  independent  location 
effects  have  already  been  calculated  for  the  original  data,  calculated 
dispersion  effects  must  be  functions  of  these.  Recently  (Box  and  Meyer  1984a) 
a  general  theory  of  location-dispersion  aliasing  has  been  obtained  for 
factorials  and  fractional  factorials  at  two  levels.  For  illustration,  in  this 
particular  example  it  turns  out  that  the  following  identity  exists  for  the 
dispersion  effect  that  is  the  F  ratio  associated  with  factor  D  and 

hence  for  column  1  of  the  design. 

A  ^  ^  ^  A  A  rt  A  A  O  A  A  A  A  A  _  A  A  n 

F  (2-3)  +(4-5)  +(6-7)  +(8-9)  +  ( 1 0- 1 1 )  +( 1 2- 13 )  +( 14- 1 5 ) 

1  (2+3)2+(4+5)2+(6+7)2+(8+9)2+( 10+1 1 ) 2+( 12+1 3 ) 2+( 14+1 5 ) 2 

A  A  A  A 

Now  (see  Table  2)  14  =  B  =  2.15  and  15  =  C  =  3.10  are  the  two  largest 

location  effects,  standing  out  from  all  the  others.  The  extreme  value  of  F^ 
associated  with  an  apparent  dispersion  effect  of  factor  D ( 1 )  is  largely 


★ 

In  this  figure  familiar  normal  theory  significance  levels  are  also  shown. 
Obviously  the  necessary  assumptions  are  not  satisfied  in  this  case,  but  these 
percentages  provide  a  rough  indication  of  magnitude. 
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Effect  D  H  e1  G  F  GH  AC  A  E  AH  e2  AG  J  B  C 
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accounted  for  by  the  squared  sum  and  squared  difference  of  the  location 
effects  B  and  C  which  appear  respectively  as  the  last  terras  in  the 
denominator  and  numerator  of  equation  (1).  A  natural  way  to  proceed  is  to 
compute  variances  from  the  residuals  obtained  after  eliminating  large  location 
effects.  After  such  elimination  the  alias  relations  of  equation  (1)  remain 
the  same  except  that  location  effects  from  eliminated  variables  drop  out. 

That  is  zeros  are  substituted  for  eliminated  variables.  Variance  analysis  for 
the  residuals  after  eliminating  effects  of  B  and  C  are  shown  in  Figure 
2(b),  The  dispersion  effect  associated  with  C  (factor  15)  is  no w  correctly 
indicated  as  extreme.  It  is  shown  in  the  paper  referenced  above  how,  more 
generally,  under  circumstances  of  effect  sparsity  a  location-dispersion  model 
may  be  correctly  identified  when  a  few  effects  of  both  kinds  are  present. 


6.  ANALYSIS  OF  UNREPLICATED  FRACTIONAL  DESIGNS 


Another  important  problem  in  the  analysis  of  unreplicated  fractional 
designs  and  other  orthogonal  arrays  concerns  the  picking  out  of  "active" 
factors.  A  serious  difficulty  is  that  with  unreplicated  fractional  designs  no 
simple  estimate  of  the  experimental  error  variance  against  which  to  judge  the 
effects  is  available. 

In  one  valuable  procedure  due  to  Cuthbert  Daniel  (1959,  1976)  effects  are 
plotted  on  Normal  probability  paper.  For  illustration  Table  4  shows  the 
calculated  effects  from  a  2^v '  design  used  in  an  experiment  on  injection 
molding  (Box,  Hunter  and  Hunter,  1978,  p.  399).  These  effects  are  plotted  on 
normal  probability  paper  in  Figure  3. 


T1 

= 

I 

o 

4- 

mold  temp. 

t2 

-0.1  +  2 

moisture  content 

t3 

= 

5.5  ■»  3 

holding  pressure 

t4 

= 

-0.3  *  4 

cavity  thickness 

*5 

= 

-3.8  +  5 

booster  pressure 

T5 

-0.1  -v  6 

cycle  time 

t7 

=? 

0.6  +  7 

gate  size 

T8 

= 

1.2  +  8 

screw  speed 

Tg  = 

T1  .2  =  _0,6  + 

1.2 

+ 

3.7 

+ 

4.8 

+ 

5.6 

T10  ~ 

t1.3  =  °‘9  ♦ 

1.3 

+■ 

2.7 

+ 

4.6 

+ 

5.8 

T1 1  = 

II 

1 

O 

1  .4 

+ 

2.8 

+ 

3.6 

+ 

5.7 

T12  = 

T1 .5  “  4,6  * 

1.5 

+ 

2.6 

+ 

3.8 

+ 

4.7 

t13  = 

T1 .6  0.3  + 

1.6 

+ 

2.5 

+ 

3.4 

+ 

7.8 

t14  " 

t1.7  =  -0,2  * 

1  .7 

+ 

2  .3 

+ 

6.8 

+ 

4.5 

Tib  = 

Ti.g  =  -0.6  ■» 

1 .8 

+ 

2.4 

+ 

3.5 

+ 

6.7 

8  —  4 

TABLE  4.  Calculated  effects  from  a  2IV  design  showing 

alias  structure  assuming  three  factor  and  higher  order 
interactions  negligible.  Injection  molding  experiment. 
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An  alternative  Bayesian  approach  (Box  and  Meyer,  1984b)  is  as  follows: 
Let  TlfT . Tv  be  standardized  effects  with 


if  effect  inert 


T.  =  e-  +  t.  if  effect  active 
ill 

2  2 

2  2  2  °  +  aT 

ei  -*■  N(0 ,0  ),  *  N(0fOT)  k  =  - - - 


Suppose  the  probability  that  an  effect  is  active  is  a. 


Let  a(r)  be  the  event  that  a  particular  set  of  r  of  the  v  factors 
are  active,  and  let  T  vector  of  estimated  effects  corresponding  to 

active  factors  of  a(r)*  Then,  (Box  and  Tiao,  1968)  with  p(o)  a  —  the 
posterior  probability  that  T  are  the  only  active  effects  is: 

r  r  — 

p[®,_jT,o#k]  «  [jrrz}  h  -  n  -  ~)  411} 

k 


(r) 


where  s(r)  "  £(r)~(r)  and  S  =  In  Particular  the  marginal  probability 

that  an  effect  i  is  active  give  T,  a  and  k  is  proportional  to 


I 

3(r) 

i  active 


rak 


A  study  of  the  fractional  factorials  appearing  in  Davies  (1954),  Daniel 
(1976)  and  Box,  Hunter  and  Hunter  (1978)  suggested  that  a  might  range  from 
0.15-0.45  while  k  might  range  from  5  to  15.  The  posterior  probabilities 
computed  with  the  (roughly  average)  values*  a  =  0.30  and  k  =  10  are  shown 
in  Figure  4(a)  in  which  N  denotes  the  probability  (negligible  for  this 
example)  that  there  are  no  active  effects.  The  results  from  a  sensitivity 
analysis  in  which  a  and  k  were  altered  to  vary  over  the  ranges  mentioned 
above  is  shown  in  Figure  4(b). 


It  will  be  seen  that  Figure  4(a)  points  to  the  conclusion  that  active 
effects  are  associated  with  columns  3,  5  and  12  of  the  design  and  that  column 
8  might  possibly  also  be  associated  with  an  active  factor.  Figure  4(b) 
suggests  that  this  conclusion  is  very  little  affected  by  widely  different 
choices  for  a  and  k.  Further  research  with  different  choices  of  prior, 
with  marginization  with  respect  to  k,  and  with  different  choices  of  the 
distribution  assumptions  is  being  conducted. 


* 

For  three-level  and  mixed  two  and  three  level  designs  for  example,  this 
analysis  is  carried  out  after  the  effects  are  scaled  so  that  they  all  have 
equal  variances. 
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n 


L  2  3  4 

— 1 f 

5  6  7  8' 

isitivity  ana 

lysis  for  poste 

=  .15  -  .45, 

k  =  5  -  15. 

4i 

7-V-  •-*.  •  ■  -  ' 

'•  ■-*-  --I.,  v.1 

7.  OTHER  RESEARCH 


Topics  which  are  emphasized  in  Taguchi's  approach  to  "off  line  quality 
control"  are  (a)  reduction  of  variation  by  error  transmission  studies  and  (b) 
the  choosing  of  a  product  design  so  that  it  is  robust  with  respect  to 
environmental  variables. 

These  topics  are  also  receiving  attention  in  further  research. 
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1. 


Combined  arms  models  of  combat.  The  classical  Lanchester  model  of 


military  combat  is  defined  by  the  equations 

x  3  -ay  , 
y  =  -bx  , 

where  x(b)  is  the  strength  of  the  X-force  at  time  t  and  y(b)  is  the 
strength  of  the  Y-force  at  time  t  .  If  we  are  interested  in  adjoining 
logistics  considerations  to  combat  models  it  is  more  realistic  to  start  with 
combined  arras  models  of  combat. 

A  general  combined  arras  model  of  Lanchester  type  can  be  formulated  in  the 

following  way.  The  X-force  is  assumed  to  have  m  units  of  strengths 

x1(t),....x  (t)  at  time  t  and  the  Y-force  has  n  units  of  strengths 
1  m 

y^ (t ), . . . ,y^ (t )  at  time  t  .  These  units  raay  be  of  different  types.  Let 

Let  x  =  (x.  ,.,.,x  )  ,  y  =  (y« *•••»▼  )  •  The  combat  between  X  and  Y 
1  m  J 1  ’•'n 

forces  is  governed  by  the  equations 


j 

f  X  "  -Ay  , 

(1) 

|  • 

1 

k  y  =  -Bx  , 

where 

A  is  an  mxn 

matrix 

and 

B  an  nxm 

matrix.  We  also  have  A  >  0 

and  B 

>  0  ,  i.e.  A 

and  B 

are 

nonnegative 

matrices.  The  properties  of  the 

solution  of  the  system  (1)  depend  entirely  upon  the  matrix 

0  -A 

M  « 

l_B  °J 

an  (n+ro)x (n+ra)  matrix,  subject  to  initial  conditions 


x(0)  =  x0  =  , 


y(0) 


(y10,,,,,yn0)  * 


We  have  x^  >  0  ,  y^  >  0  . 

In  such  a  model  the  elements  of  the  matrixes  A  and  B  have  the 


following  forms:  a.,  -  X..ol..  and  b.. 

LJ  tj  ij  lj 


U..6*.  .  The  number  a.,  repre- 


seats  the  efficiency  of  the  unit  y^  of  the  Y-force  when  used  against  the 

unit  of  the  X-force  .  On  the  other  hand,  the  number  represents 

rhe  fraction  of  the  firepower  of  unit  y^  directed  against  the  unit  ^  by 

m 

the  Y-commander .  We  may  suppose  that  £  X. .  s  1  .  The  numbers  3..  and 

i*i 

U..  are  similary  defined.  The  numbers  a.,  and  8..  are  analogous  to  the 
tj  tj 

coefficients  a  and  b  in  the  classical  Lanchester  model — thus  we  may  call 

them  attrition  rate  coefficients.  The  numbers  X..  and  y. .  (note  that 

ij  *0 

n 

T  y .  .  *  1  for  each  i)  represent  a  priori  choices  which  must  be  made  by  the 
i-i  lJ 

commanders  of  the  Y-force  and  the  X-force,  respectively. 

An  element  a.,  of  the  matrix  A  can  be  zero  if  either  the  unit  y.  is 
1 J  J 

ineffective  against  the  unit  or  if  the  Y-comraander  elects  not  to  use 

unit  y^  against  unit  .  A  similar  meaning  is  attached  to  an  element 

b . .  of  B  being  zero. 

1 J 

An  example  of  a  combined  arms  model  is  the  following  which  includes  X 
and  Y  units  of  four  types: 

(i)  Direct  fire  combat  unit. 

(ii)  Artillary  unit; 

(a)  direct  support, 

(8)  counter  battery  fire, 

(y)  air  defense  supression. 
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(iii)  Air  support  unit; 

(a)  close  support, 

(0)  air  defense  suppression, 

(y)  artillery  suppression. 

(iv)  Air  defense  unit. 

Here  we  have  listed  the  functions  each  unit  can  perform.  The  model  then  takes 
the  form 


and  Bj  has  the  same  form  as  A^  .  The  matrix  is  reducible  so  the  model 

can  be  subdivided  into  two  submodels.  The  main  submodel  has  matrix 
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A  satisfactory  theory  can  be  developed  if  we  assume  M  is  a  Lanchester 
Matrix  (see  Maybee  1984).  This  means  A  and  B  are  regular,  i.e.  each  row 
and  column  contains  at  least  one  nonzero  element,  and  M  is  irreducible.  In 
the  above  example  the  matrix  M  is  a  Lanchester  Matrix. 

2.  Supply  models.  For  the  simplest  supply  models  we  adjoin  to  the  system  (2) 
supply  vectors  Sx(t)  31  (Sx^  , . . . ,  Sx^)  ,  and  Sy(t)  *  (Sy^ , . . .  ,Syn)  .  Here 
Sx^(t)  is  the  supply  level  of  unit  x^  of  X-force  at  time  t  .  The 
function  Sy^(t)  has  a  similar  meaning. 

The  equations  of  combat  are  supplemented  by 

(2)  Sx  =  -D jX,  Sy  *  -D2y 

where  and  are  diagonal  matrices  of  size  mxra  and  nxn  ,  respec¬ 

tively,  with  positive  diagonal  elements.  Thus  the  supply  levels  of  the  units 
diminish  with  time  at  a  rate  proportional  to  the  unit  level. 

The  system  of  equations  (1),  (2)  is  to  be  solved  subject  to  the  following 
stopping  rules.  If  a  unit  level  or  supply  level  falls  below  an  acceptable 
percentage  of  initial  unit  or  initial  supply  level,  then  that  unit  is 
withdrawn  from  combat. 

Note  that  in  this  model,  once  the  equations  (1)  are  solved,  equations  (2) 

can  be  explicitly  integrated  to  determine  the  vectors  Sx(b)  and  Sy(t)  . 

Thus  the  theory  of  system  (1)  is  immediately  applicable  to  it. 

A  more  sophisticated  supply  model  can  be  developed  as  follows.  Define 

the  vectors  u(t)  *=  (u,(t),...,u  (t))  and  v(t)  *  (v.(t),...,v  (t))  which 

Ip  1  q 

represent  supply  lines  for  the  X-force  and  Y-force,  respectively.  Here  u^(t) 
is  the  capacity  of  the  i-th  X-force  supply  line  at  time  t  .  The 
function  v.(t)  has  a  similar  meaning. 


In  this  version  of  the  model  we  assume  supplies  are  brought  in  as  combat 


goes  on.  Also  each  commander  directs  fire  against  opposing  supply  lines. 
Thus  to  the  equations  (1)  we  now  adjoin  the  equations 

f  Sx  9  -D^x  +  Au  , 

( 2  1 )  < 

l  Sy  =  -D^y  +  Bv 

and 

f  u  =  -Ey  , 


Here  A  is  an  mxp  matrix,  B  is  nxq  matrix,  E  is  a  pxn  matrix,  and 
F  is  a  qxm  matrix.  All  of  these  matrices  are  nonnegative.  Initial  values 
are  given  for  Sx,  Sy,  u  and  v  where  Sx(Q)  >  0  ,  Sy(0)  >  0  ,  u(0)  >  0  , 

v(0)  >  0  ,  The  elements  of  the  matrices  A,  B,  E,  and  F  have  the  same  form 
as  those  in  the  matrices  A  and  B  .  This  is  because  each  commander  makes  a 
priori  decisions  as  to  how  he  uses  his  supply  lines  to  bring  in  supplies  and 
also  how  much  of  his  firepower  he  directs  against  each  of  his  opponents' 
supply  lines. 

Note  that  in  this  model  it  is  still  true  that  the  solution  of  the  system 
(1)  completely  determines  the  entire  model.  Once  the  system  (1)  has  been 
solved  the  system  (3)  can  be  integrated  directly  to  determine  the  vectors  u 
and  v  .  Then  the  system  (2*)  can  be  integrated  to  determine  the  vectors  Sx 
and  Sy  . 

Nevertheless  we  have  a  significant  new  issue  introduced  here.  TTiis  is 
because  of  the  question  of  how  much  firepower  should  be  used  against  supply 
lines  versus  how  much  is  used  against  opposing  combat  units.  This  version  of 


the  model  permits  us  to  evaluate  the  effects  of  such  choices  upon  the  outcome 
of  combat.  The  previous  simpler  version  only  allows  us  to  evaluate  the 
effects  of  supply  levels  upon  the  outcome  of  combat. 

A  very  sophisticated  supply  model  can  now  be  formulated.  We  now  assume 
that  the  supply  lines  are  also  used  to  reinforce  the  combat  units.  We  then 
have  an  entirely  new  model  having  the  form 


(4) 


x  =  -Ajy  +  A2u  ,  u  =  -Ey  , 

/ 

y  =  -BjX  +  B2v  ,  v  =  -Fx  , 

Sx  3  -DjX  +  Au  , 

\  #  *' 

Sy  3  -D2y  +  Bv  . 


Here  is  an  mxp  matrix  and  is  an  nxq  matrix,  0  ,  >  0  . 

The  elements  of  these  matrices  have  the  same  form  as  the  matrices  A^  and 
B^  ,  that  is,  they  are  the  products  of  coefficients  which  define  the 
capability  of  each  supply  line  to  furnish  the  given  type  of  reinforcements 
(men,  tanks,  etc.)  with  a  number  on  the  interval  [0,1]  which  represents  the 
fraction  of  supply  line  capacity  devoted  to  such  reinforcements. 

The  matrix  of  the  system  (4)  is 


0 

A2 

"A1 

0 

0 

0 

-E 

0 

-B1 

0 

0 

B2 

_-F 

0 

0 

0  _ 

and  the  system  to  be  solved  is 


(5) 


We  call  N  a  Lanchester  logistics  matrix  if  each  of  the  matrices  A^,  B^, 

^2’  **  *8  re8ular  and  N  is  irreducible.  Of  course,  systems  for  which 

N  is  reducible  can  be  decomposed  into  smaller  irreducible  systems.  Also  once 
(5)  has  been  solved,  we  may  again  determine  Sx  and  Sy  by  integration. 

Preliminary  results  show  that  the  basic  properties  of  the  solution  of  (5) 
depend  only  upon  the  structure  of  the  matrix  N  .  Thus  all  such  logistics 
systems  can  be  expected  to  have  similar  solutions. 


3.  Issues  that  can  be  addressed  by  such  models.  It  is  important  to 
understand  first  how  our  models  should  be  used.  Because  of  the  fact  that  a 
large  set  of  a  priori  decisions  must  be  made  by  ech  commander  with  regard  to 
how  he  allocates  his  firepower  and  how  he  uses  his  supply  lines,  it  is 
reasonable  to  suppose  that  a  given  matrix  M  or  N  will  apply  from  time  0 
to  the  first  time  t^  at  which  one  of  the  commanders  changes  his  allocations. 
Then  M  is  replaced  by  a  new  matrix  H  (N  by  N)  and  a  model  of  the  same 
general  form  holds  until  time  t ^  •  Of  course,  the  initial  values  for  the 
interval  [t^jtj]  are  the  same  as  the  final  values  at  t^  using  the  matrix 
M  or  N  .  Thus  a  lengthy  combat  can  be  modeled  as  a  sequence  of  such 
models.  We  can  even  use  our  models  for  combat  which  lasts  over  a  period  of 
days  or  weeks  with  intermittent  periods  of  quiet  (say  at  night)  during  which 
supplies  or  reinforcements  are  brought  in.  Then  the  initial  conditions  for 
[tptjl  would  not  necessarily  be  the  same  as  the  final  conditions  on  [0,t^]. 
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A  variety  of  problems  may  be  solved  using  these  models.  We  may 
investigate  the  effect  of  a  priori  decisions  made  by  the  commanders  upon  the 
battle  progress.  To  do  this  it  is  necessary  to  devise  various  measures  of 
combat  effectiveness  (see  Willis  1982  for  a  variety  of  such  measures  applied 
to  classical  combat  models).  We  can  study  the  result  of  invoking  various 
stopping  rules  and  the  effect  of  initial  force  sizes  on  battle  progress.  All 
of  these  issues  can  be  studied  using  combined  arms  models. 

The  issues  mentioned  above  are  also  relevant  to  the  various  supply 
models*  But  we  may  use  the  supply  models  to  try  and  understand  the  answers  to 
other  questions.  For  exmaple,  what  is  the  relation  between  combat  levels  and 
supply  levels  and,  in  particular,  what  are  the  optimal  initial  supply  levels? 
How  should  suply  line  capacities  be  allocated  so  as  to  insure  against  having 
to  withdraw  from  combat  because  of  inadequate  supplies?  Conversely,  what  is 
the  most  effective  allocation  of  fire  power  between  enemy  units  and  enemy 
supply  lines?  Deeper  issues  concern  questions  such  as  when  a  commander  should 
change  his  allocations  and  how. 
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OVERVIEW  OF  EXPERT  SYSTEMS 


Capt  Stephen  E.  Cross,  USAF 
Artificial  Intelligence  Laboratory 
Air  Force  Institute  of  Technology 
Wright-Patterson  AFB,  Ohio  45433 


Introduction 

An  expert  system  is  a  computer  program  that  embodies  the 
knowledge  of  a  human  expert  and  a  reasoning  process  (which  may  be 
suggestive  of  the  human  expert's  reasoning  process)  to  perform 
some  problem  solving  task  that  is  usually  deferred  to  a  human 
expert.  Often,  these  programs  are  referred  to  as  knowledge  based 
programs  or  intelligent  assistants.  Many  programs  can  be  con¬ 
sidered  to  perform  some  expert  task.  For  instance,  a  FORTRAN 
program  that  computes  a  Fourier  transform  accomplishes  something 
that  would  be  difficult  for  a  human  to  do.  What  distinguishes 
expert  systems  in  the  artificial  intelligence  context,  is  that 
these  programs  use  the  same  type  of  experiential  knowledge  as  do 
their  human  counterparts.  A  significant  architectural  character¬ 
istic  is  that  this  knowledge  is  contained  in  a  separate  knowledge 
base. 


Expert  systems  can  perform  many  tasks.  A  taxonomy  of  tasks 
include:  prediction,  diagnosis,  design,  planning,  monitoring, 
interpretation,  debugging,  repair,  instruction,  and  control. 

The  type  of  knowledge  that  human  experts  use  can  be  divided 
into  three  categories:  facts,  heuristics,  and  beliefs.  Facts  are 
perhaps  the  easiest  form  of  knowledge  to  visualize.  They  are  just 
static  pieces  of  data  which  are  thought  to  be  true.  For  instance, 
a  fact  is  'the  wing  span  of  a  T-38  is  24  feet.'  Heuristics  are 
pieces  of  expertiential  knowledge  which  are  most  often  stated  in 
the  form  of  production  rules.  Heuristics  are  'rules  of  thumb'  or 
gut  feelings  that  are  acquired  throughout  the  course  of  a  career. 
They  are  rarely  recorded  in  text  books  or  professional  articles. 

An  example  heuristic  is  taken  from  MYCIN  [ref  1]  an  expert  system 
that  performs  the  diagnosis  and  recommended  treatment  task  in 
infectious  blood  disease. 

If  1)  the  infection  is  primary-bacteremia, 

2)  the  site  of  the  culture  is  one  of  the  sterile  sites, 

3)  the  suspected  portal  of  entry  of  the  organism  is  the 
gastrointestinal  tract. 

Then  there  is  suggestive  evidence  (cf=.7)  that  the  identity 
of  the  organism  is  bacteroides. 

The  computer  representation  of  this  rule  would  look  like: 

(IF  (AND  (SAME  CNTXT  INFECT  PRIMARY-BACTEREMIA) 

(MEMBF  CNTXT  SITE  STERILESITES) 

(SAME  CNTXT  PORTAL  GI)) 


(TH  (CONCLUDE  CNTXT  IDENT  BACTEROIDES  TALLY  .7) 

An  expert  system  requires  knowledge  of  belief.  Belief 
enables  the  computer  to  decide  how  much  credibility  to  attach  to 
facts  or  heuristics.  Quite  often  belief  has  been  represented 
probablistically,  but  symbolic  representations  of  belief  are  now 
becoming  popular.  MYCIN's  use  of  certainty  factors  is  typical  of 
numerical  belief  representation.  Belief  is  mapped  into  the  range 
(-1,11  where  1  represents  being  certain  something  is  true,  '1 
represents  being  certain  something  is  not  true,  and  0  represents 
the  lack  of  any  knowledge  to  believe  or  not  believe  something. 
The  above  rule  has  a  cf  of  .7  which  indicates  a  fairly  certain 
level  of  belief  (equivalent  to  a  physician  saying  'I'm  fairly 
certain').  An  algorithm  is  used  to  combine  cf's  during  a  search 
process  for  applicable  rules  so  that  the  path  with  the  highest 
combined  cf  is  evaluated  first. 

An  Example. 

As  a  prototypical  expert  system,  I  will  discuss  the  animal 
production  system  of  Winston  and  Horn  (ref  21.  This  system  is 
used  to  identify  animals  in  a  zoo.  Although  simple,  it  illus¬ 
trates  how  more  complicated  systems  like  HYCIN  operate.  An  under¬ 
standing  of  production  system  operation  is  a  prerequisite  for 
understanding  more  complicated  rule-based  system  architectures. 

A  production  system  consists  of  a  rule-base,  a  data  base, 
and  a  control  program.  The  rule  base  is  the  repository  of  all 
heuristics.  In  theory,  the  rule  base  is  unordered.  That  is,  there 
is  no  significance  in  contiguous  rules.  Some  systems  (e.g., 
MYCIN)  include  certainty  factors  which  are  processed  to  give  a 
measure  of  belief.  In  the  animal  system  there  is  no  uncertainty, 
hence  there  is  no  need  for  certainty  factors.  The  data  base  in 
MYCIN  consists  of  facts  gleared  from  the  patient  history  (e.g., 
the  subject  smokes  3  packs  a  day)  and  results  of  laboratory 
tests.  In  the  animal  system  the  data  base  is  a  list  of  symbolic 
facts.  The  list  is  a  respository  for  known  characteristics  of 
animals.  The  control  structure  uses  backward  chaining.  For 
instance,  if  the  computer  wanted  to  deduce  that  the  patient  had  a 
particular  disease,  it  would  obtain  a  list  of  rules  that  made 
conclusions  about  that  disease.  Then  the  antecedents  of  each  such 
rule  would  be  tested.  Antecedents  for  which  there  is  insufficient 
data  would  be  defined  as  subgoals  and  rules  that  made  conclusions 
about  these  new  subgoals  would  be  accessed,  hence  backward  chain¬ 
ing. 

Many  other  control  schemes  are  possible.  The  process  of 
backward  chaining  is  often  called  goal-directed  search.  Another 
process,  forward  chaining,  seeks  to  invoke  rules  whose 
antecedents  presently  match  the  data  base.  This  strategy  is 
called  data-driven  search.  Combinations  of  backward  and  forward 
chaining  are  often  employed  in  production  systems.  McDermott  (ref 
3)  has  proposed  several  variations.  Rather  than  a  pure  forward  or 
backward  search,  McDermott  suggests  keeping  track  of  rules  that 
have  been  applied  successfully  in  the  past  and  trying  them  first. 
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or  trying  rules  first  that  have  what  are  judged  to  be  computa¬ 
tionally  tractable  antecedents. 

The  animal  production  system  consists  of  lists  of  facts  and 
rules.  The  following  trace  will  illustrate  some  of  the  basic 
concepts. 

;  the  data  base  initially  consists  of  these  known  facts 
(setq  facts  '((animal  has  hair) 

(animal  eats  meat) 

(animal  has  tawny  color) 

(animal  has  dark  spots))) 

;the  rule  base  consists  of  rules  like 

(setq  rules  '((rule  idl  (if  (animal  has  hair)) 

(then  (animal  is  mammal))) 

(rule  id2  (if  (animal  gives  milk)) 

(then  (animal  is  mammal))) 

(rule  id3  (if  (animal  has  feathers)) 

(then  (animal  is  bird))) 

(ruleid4(if  (animal  flies) 

(animallays  eggs)) 

(then  (animal  is  bird))) 

(rule  id5  (if  (animal  eats  meat)) 

(then  (animal  is  carnivore))) 

(rule  id6  (if  (animal  has  pointed  teeth) 

(animal  has  claws) 

(animal  has  forward  eyes)) 
(then  (animal  is  carnivore))) 

(rule  id7  (if  (animal  is  mammal) 

(animal  has  hoofs)) 

(then  (animal  is  ungulate))) 

(rule  id8  (if  (animal  is  mammal) 

(animal  chew  cud)) 

(then  (animal  is  ungulate) 

(animal  is  even  toed))) 

(rule  id9  (if  (animal  is  carnivore) 

(animal  has  tawny  color) 
(animal  has  black  stripes)) 
(then  (animal  is  cheetah))) 


The  animal  production  system  can  be  run  in  the  forward  or 
backward  chaining  mode.  In  the  forward  mode,  a  search  will  be 
made  for  rules  whose  left  side  match  the  data  stored  in  the  facts 
list.  Any  rule  that  matches  (and  whose  right  side  has  not  been 
previously  written  into  the  list)  will  have  its  right  side  added 
to  the  list.  The  search  continues  until  no  more  rules  are  appli¬ 
cable.  In  this  system  the  function  deduce  accomplishes  this 
search. 

(deduce) 

rule  idl  deduces  (animal  is  mammal) 


rule  id5  deduces  (animal  is  carnivore) 
rule  id9  deduces  (animal  is  cheetah) 

At  this  point  the  facts  list  becomes: 

((animal  has  hair)  (animal  eats  meat) 

(animal  has  tawny  color)  (animal  has  dark  spots) 

(animal  is  mammal)  (animal  is  carnivore) 

(animal  is  cheetah)) 

The  system  can  be  run  in  a  backward  chaining  mode  by  estab¬ 
lishing  a  list  of  hypotheses. 

(setq  hypotheses  '((animal  is  cheetah) 

(animal  is  ostrich) 

(animal  is  penguin) 

(animal  is  cow) 

(animal  is  elephant) 


A  search  is  made  for  rules  whose  left  side  support  each 
hypothesis.  The  function  that  performs  this  task  is  diagnose. 

(diagnose) 

Is  it  true:  (animal  has  feathers)?  no  ;the  user  responds 
Is  it  true:  (animal  flies)?  no 
Is  it  true:  (animal  has  hair)?  yes 
rule  idl  deduces  (animal  is  mammal) 

Care  to  know  how?  yes 

(animal  is  mammal)  demonstrated  by:  (animal  has  hair) 

Is  it  true:  (animal  has  hoofs)?  why  ;the  user  can 
;ask  why  such  a  question  was  asked 

(animals  has  hoofs)  needed  to  show  (animal  is  ungulate) 

Is  it  true:  (animal  has  hoofs)?  yes 
rule  id7  deduces  (animal  is  ungulate) 

Care  to  know  how?  no 

Is  it  true:  (animal  has  black  stripes)?  yes 
rule  idl2  deduces  (animal  is  zebra) 

Care  to  know  how?  no 

Hypothesis  (animal  is  zebra)  is  true. 

any  other  questions?  no 

The  production  system  approach  has  several  advantages.  The 
knowledge  base  is  separate  from  the  control  program,  hence  it  is 
easy  to  modify,  add,  or  delete  knowledge.  Modification  of  the 
knowledge  base  does  not  inhibit  operation  of  the  computer  pro¬ 
gram.  Knowledge  has  a  uniform  representation. 

A  major  advantage  of  the  production  system  approach  is  that 
the  program  can  'explain'  its  solutions  by  reciting  some  portion 
of  the  rules  that  were  used  in  the  reasoning  process.  For 
example,  a  backward  chaining  production  system  interprets  ques¬ 
tions  like  'How?'  to  mean  'How  did  I  reach  this  conclusion'  and 


will  list  the  rules  that  were  instrumental  in  the  decision. 
Questions  like  'Why?'  (why  was  this  question  asked)  are  answered 
by  listing  the  antecedents  of  the  rule  to  which  the  questions 
context  is  the  subgoal.  Sample  questions  and  answers  are  shown  in 
the  NYCIN  trace  above.  It  should  be  noted  that  this  is  a  very 
primitive  form  of  computer  understanding.  A  truly  intelligent 
explanation  facility  would  have  powerful  'truth  maintenance' 
facilities  and  access  to  first  principles.  An  interesting  case 
study  was  conducted  with  NYCIN  about  four  years  ago.  It  was 
decided  that  since  MYCIN  was  so  splendid  at  diagnosis  and  had  an 
explantion  capability,  that  it  would  make  a  good  medical  instruc¬ 
tor.  But  it  was  soon  found  that  an  experiential  knowledge  base 
was  deficient  at  explaining  many  of  the  causal  relationships  in 
medicine.  For  instance,  MYCIN  has  a  rule  that  says  'don't 
administer  tetracycline  to  children  under  eight  years  of  age.'  No 
where  in  the  knowledge  are  the  facts  that  tetracycline  inhibits 
bone  development,  a  physiological  piece  of  knowledge. 

There  are  several  problems  with  production  systems  at  the 
present.  As  stated  above,  an  expert  system  needs  access  to  large 
stores  of  knowledge.  Some  of  the  knowledge  is  experiential  and  is 
probably  best  represented  as  production  rules.  Other  knowledge 
concerns  domain  theories.  CASENET  [ref  41  represents  causal  rela¬ 
tionships  in  internal  medicine.  MDX  (ref  5]  orders  disease  pro¬ 
cesses  in  a  tree  structure  and  records  at  each  node  only  that 
knowledge  that  is  required  to  establish  the  existence  of  that 
disease  process.  Cross'  air  traffic  control  system  (ref  61  uses  a 
network  representation  for  control  algorithms.  New  heuristics  are 
justified  by  propagating  values  (the  effect  of  applying  heuris¬ 
tics)  throughout  the  network. 

NYCIN  led  to  the  development  of  many  expert  systems.  Since 
the  knowledge  base  was  separate  from  the  program,  it  is  possible 
in  many  applications  to  simply  insert  a  new  knowledge  base  for  a 
different  domain.  PUFF  (ref  7]  an  expert  system  for  the  diagnosis 
of  pulmonary  lung  disease,  was  written  in  EMYCIN  (essentially 
MYCIN).  PUFF  demonstrates  that  a  new  expert  system  can  be  built 
in  a  fairly  short  period  of  time  provided:  1)  the  domain  of 
application  is  not  sufficiently  changed,  and  2)  the  people  build¬ 
ing  the  expert  system  have  experience  in  building  knowledge  based 
systems.  PUFF  was  created  in  about  100  man  hours  by  a  team  of 
expert  knowledge  engineers  and  physicians. 

One  major  bottleneck  in  expert  system  design  is  knowledge 
acquisition.  The  speed  at  which  an  expert  system  can  be  built  is 
directly  related  to  the  skill  and  experience  of  the  expert  system 
builder,  commonly  called  the  knowledge  engineer.  It  is  his  job  to 
become  sufficiently  conversant  in  the  domain  to  talk  intelligent¬ 
ly  with  a  cooperative  domain  expert,  and  to  obtain  the  heuristics 
that  the  domain  expert  uses  to  do  difficult  problem  solving.  It's 
a  paradox  that  domain  experts  have  only  vague  ideas  of  the  actual 
heuristics  that  they  use.  This  is  why  training  programs  like 
medicine,  law,  etc.  take  many  years.  Michie  (ref  81  relates  an 
interesting  example.  A  cheese  manufacturing  company  in  England 
relied  on  the  skill  of  an  elderly  gentleman  to  do  quality  control 


of  its  products.  He  could  assess  the  quality  of  cheese  by  probing 
his  finger  through  the  wax  seal  and  feeling  the  cheese  inside. 
Because  of  his  prowness  and  the  fact  that  he  was  quite  elderly, 
the  company  management  wanted  to  automate  his  expert  behavior. 
They  brought  in  many  mechanical  engineers  to  attempt  to  build  a 
device  which  could  probe  the  cheese  in  the  same  manner  as  the 
expert.  What  the  expert  did  not  realize  and  hence  was  unable  to 
verbalize  to  the  system  builders,  was  that  his  probing  was  simply 
a  mental  device  he  employed  to  focus  his  sense  of  smell  on  the 
particular  cheese  in  front  of  him. 

Another  problem  with  rule  bases  is  that  they  become  large 
and  search  becomes  computationally  expensive.  TIERESIAS,  the 
subject  of  Randall  Davis'  PhD  work  [ref  91,  used  meta-rules  to 
guide  the  invocation  of  domain  rules.  A  meta-rule  is  simply  a 
production  rule  that  makes  conclusions  about  domain  rules.  An 
example  which  was  used  in  conjunction  with  a  MYCIN-like  system 
for  investment  advice  is  shown  below: 

If  1)  the  age  of  the  investor  is  greater  than  50, 

2)  the  investor  is  not  independently  wealthy. 

Then  there  is  evidence  (cf*1.0)  that  only  stocks  that 
have  high  dividends  should  be  considered. 

Another  approach  to  knowledge  base  organization  was  offered 
by  Aikins  [ref  10]  in  her  PhD  dissertation.  She  noted  the  search 
inefficiency  problems  in  PUFF  and  developed  a  frame  based  expert 
system  where  rules  were  organized  into  disease  groups. 

The  final  problem  in  rule  bases  which  will  be  discussed  here 
is  belief  justification.  It  is  very  important  that  only  one 
domain  expert  be  consulted  in  the  creation  of  a  new  knowledge 
base.  Often  one  expert's  heuristic  will  contradict  another  ex¬ 
pert's.  The  computer  at  present  has  no  mechanism  for  truth  main¬ 
tenance  [ref  11]  although  research  into  this  area  is  proceeding. 

An  Ideal  Architecture. 

One  should  remember  that  a  production  system  is  only  a 
simple  architecture  of  an  expert  system.  The  anatomy  of  an 
'ideal'  expert  system  is  shown  in  Fig.  1  [ref  12:171.  The  system 
consists  of  a  natural  language  front-end  to  facilitate  communica¬ 
tion  between  the  computer  and  the  user.  Brooke's  thesis  [ref  131 
describes  a  universal  natural  language  front-end  for  expert  sys¬ 
tems.  The  blackboard  is  used  to  record  intermediate  results 
posted  for  use  by  many  knowledge  bases  which  may  be  operating  in 
parallel.  The  knowledge  base  contains  facts  as  well  as  heuristic 
problem-solving  rules.  The  interpreter  controls  how  the  knowledge 
base  is  searched.  The  enforcer  adjusts  previous  conclusions  when 
new  data  or  knowledge  alter  their  bases  of  support.  The  justifier 
rationalizes  and  explains  the  system's  behavior. 


Hearsay  II  (ref  14],  a  system  designed  to  do  speech  recogni¬ 
tion,  embodies  many  of  the  concepts  from  the  ideal  architecture. 
Hearsay  II  was  organized  as  a  body  of  cooperating,  independent 
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Ideal  Architecture 
Figure  1 


specialists.  Each  specialist  used  knowledge  that  was  unique  to 
the  speech  recognition  task.  For  instance,  there  was  a  knowledge 
base  of  rules  for  inferring  phonemes  and  a  separate  knowledge 
base  of  grammatical  rules. 

Many  other  architectures  exist.  0PS5  [ref  15]  allows  control 
rules  to  be  specified  which  allow  different  search  strategies  to 
be  invoked  in  different  contexts.  R1  (ref  16]  is  a  VAX  computer 
configuration  expert  system  written  in  0PS5.  ROSS  [ref  171  facil¬ 
itates  communicate  between  small  experts  (or  actors)  through  a 
message  passing  language.  ROSIE  [ref  18]  provides  a  structured 
English  interface  to  facilitate  representation  of  rules.  All  of 
these  systems  are  variations  on  the  production  system  theme. 

Expert  System  Advice. 

Penny  Nii  (an  expert  expert  system  builder)  has  offered  some 
practical  advice  for  those  wishing  to  build  an  expert  system  [ref 


1.  Don't  be  your  own  expert.  It  is  hard  to  be  objective 
about  your  own  'expert'  knowledge. 

2.  The  problem  must  be  well  chosen.  AI  is  not  the  answer  to 
every  problem.  Expert  systems  work  best  when  the  problem  is  well 
bounded.  This  means  that  while  we  can  represent  large  amounts  of 
problem  specific  knowledge  we  do  not  have  a  good  handle  on  repre¬ 
senting  general  world  knowledge. 

3.  You  need  to  meet  the  human  expert  more  than  halfway.  Nii 
begins  a  new  expert  system  building  task  by  reading  all  the 
literature  in  the  application  domain. 

4.  if  none  of  the  tools  that  you  have  available  will  work, 
build  one. 

5.  One  needs  a  way  to  handle  uncertainty.  A  weighting 
process  must  be  built  in  that  handles  facts  or  knowledge  like  'I 
strongly  believe  ....'  or  'It  might  cause  . ' 

6.  The  program  must  have  easy  means  of  knowledge  base 
modification.  The  program  must  be  able  to  explain  its  answers. 
Both  imply  that  if  the  expert  is  to  be  a  personal  assistant  to  a 
human,  that  it  should  have  a  useful  natural  language  front  end. 


There  are  some  serious  limitations  to  expert  systems  at  the 
present  time.  Expert  system  techniques  have  to  date  only  been 
successful  in  domains  where  the  experiential  knowledge  of  the 
expert  could  be  decoupled  from  the  world  and  common-sense 
knowledge  of  the  expert.  These  programs  tend  to  be  idiot  savants 
in  that  they  neither  recognize  an  interesting  problem  or  solution 
and  degrade  quickly  near  the  fringes  of  their  knowledge.  For 
instance,  MYCIN  fails  when  mutliple  diseases  are  present  in  the 


body  causing  some  infectious  blood  disease  symptoms  to  be  masked. 
Another  limitation  is  that  to  date,  expert  systems  have  only  been 
successfully  applied  to  domains  that  are  very  narrow.  The  com¬ 
puter  does  a  fantastic  job  at  'deep'  inferencing.  We  do  not  have 
the  capabilities  to  represent  'broad'  knowledge  in  useful  ways. 
Expert  systems  do  not  have  the  capability  to  do  common  sense 
reasoning.  For  instance,  an  expert  system  for  the  flight  domain 
might  represent  emergency  checklist  'scripts'.  One  of  these 
states  that  when  the  cabin  depressurizes,  descend.  However,  an 
intelligent  being  would  immediately  rule  out  a  descent  when 
flying  in  the  mountains. 

All  Is  Uai  Iflfii. 

Even  with  the  limitations,  there  are  many  successful  appli¬ 
cations.  Measures  of  success  are  1)  the  number  of  companies  that 
are  building  expert  systems  for  internal  use  (e.g.,  Westinghouse, 
General  Electric,  NCR),  2)  the  amount  of  venture  capital  avail¬ 
able  to  build  systems  for  stock  market  analysis,  etc.,  and  3)  the 
huge  salaries  available  to  knowledge  engineers  (up  to  $70,000  on 
the  west  coast).  We  conclude  the  section  on  expert  systems  with  a 
listing  of  the  known  expert  systems.  Much  of  the  list  is  taken 
from  Iref  19]  and  is  supplemented  with  systems  that  we  have 
worked  on. 

Air  Force  Institute  of  Technology 

-  ATC  (an  air  traffic  control  expert  system  framework) 

-  Maintenance  expert  systems  (battle  damage  assessment, 
circuit  card  diagnosis,  tech  order  automation) 

-  Military  planning  (several  military  planning  aids) 

-  Natural  language  front  ends  to  expert  systems 

-  Pilot  aids,  ongoing  research  in  advanced  expert  system 
architectures 

-  SPEREXAS  (a  speech  recognition  system) 

Bioengineering 

-  MOLGEN  (genetic  experiment  planning  aid) 

Chemistry 

-  DENDRAL  (interprets  mass  spectrometer  data) 

-  DECS  (organic  synthesis  planning) 

Computer  Systems 

-  DART  (diagnosis  of  computer  faults) 

-  R1  (configure  VAX  systems) 

-  SPEAR  (analysis  of  computer  error  logs) 

-  XSEL  (assists  sales  people  in  selecting  appropriate 
computer  systems) 


Engineering 


-  SACON  (aids  structural  engineers) 

General  Purpose 

-  AGE  (guides  development  of  expert  systems  involving 

hypothesis  formation  and  information  fusion) 

-  AL/X  (assists  diagnostic  experts) 

-  EMYCIN  (MYCIN  without  the  knowledge  base) 

-  EXPERT  (an  inference  system  used  in  oil  exploration  tasks) 

-  KAS  (an  experimental  knowledge  acquisition  system) 

-  LOOPS  (an  experimental  knowledge  representation  system) 

-  OPS  (a  basic  inference  system) 

-  ROSIE  (a  basic  inference  system) 

-  TIERESIAS  (aids  in  knowledge  acquisiton) 

-  UNITS  (an  early  version  of  LOOPS) 

Law 

-  LDS  (an  experimental  system  that  models  legal  decision 

making) 

-  TAXMAN  (an  experimental  system  that  deals  with  rules 

implicit  in  tax  laws) 

Maintenance 

-  CAT-1  (diagnosis  of  diesel  train  engines) 

Military 

-  AIRPLAN  (an  expert  system  for  air  traffic  control  around 

aircraft  carriers) 

-  HASP  (an  expert  system  for  identification  and  tracking  of 

ships  using  ocean  sonar  signals) 

-  KNOBS  (a  tactical  aircraft  planning  aid) 

-  TATR  (an  expert  system  for  tactical  air  targeteer ing,  uses 

ROSIE) 

-  SWIRL  (a  tactical  aircraft  planning  aid,  uses  ROSS) 

Resource  Exploration 

-  DIPMETER  ADVISOR  (analyzes  information  from  oil  wells) 

-  DRILLING  ADVISOR  (diagnosing  oil  well  drilling  problems) 

-  PROSPECTOR  (evaluates  sites  for  potential  mineral  deposits) 

Medicine 

-  CADUCEUS  (differential  diagnosis  in  internal  medicine) 

-  CASNET  (a  causal  network  that  associates  treatments  with 

various  diagnostic  hypotheses) 

-  MYCIN  (diagnoses  infectious  blood  diseases) 

-  MDX  (uses  compiled  knowledge  to  performed  various  diagnosis 

tasks) 

-  ONCOCIN  (a  management  system  for  cancer  chemotherapy) 

-  PUFF  (diagnosis  of  pulmonary  disorders,  uses  EMYCIN) 
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